Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PARC Reference

PARC is the source frontend of the toolchain. The real crate surface today is:

  • preprocessing through both external-driver and built-in paths
  • C parsing into a typed AST
  • extraction into a durable source IR
  • header scanning that goes straight to SourcePackage
  • AST-oriented support APIs such as visiting, spans, locations, and printing

That means the crate serves two audiences at once:

  1. downstream tools that want parc::ir::SourcePackage
  2. parser-facing tools that want direct AST access

What PARC Owns

  • preprocessing
  • parsing
  • parser recovery
  • source extraction
  • source diagnostics and provenance
  • source IR
  • header scanning
  • AST traversal and debug support

What PARC Does Not Own

  • symbol inventories
  • binary validation
  • link-plan construction
  • Rust lowering or crate emission

Actual Data Flow

raw source / headers
  -> driver or built-in preprocessor
  -> parser AST
  -> extraction
  -> SourcePackage
  -> serialized source artifact or downstream harness

scan short-circuits that flow into one high-level operation. parse and driver expose earlier stages for syntax-level consumers.

Module Layout

ModuleWhat it is actually for
driverfile-oriented parse flow using an external preprocessor
preprocessbuilt-in preprocessing, tokenization, include resolution
parsefragment parsing and direct translation-unit parsing from strings
scanend-to-end header scanning into SourcePackage
extractAST-to-IR lowering and normalization
irdurable PARC-owned source contract
astsyntax tree for parser-facing consumers
visittraversal hooks over the AST
span / locsource-position helpers
printdebug-oriented AST printer
intakealready-preprocessed source intake helpers

Boundary

The strongest consumer boundary is parc::ir::SourcePackage.

That is the point where PARC stops owning the problem. Anything involving binary evidence or Rust generation is downstream from PARC, even if tests and harnesses compose those crates together elsewhere.

Reading Strategy

Read the book in one of these orders:

  1. source-contract path: Getting Started -> Source IR -> Extraction -> Header Scanning -> API Contract
  2. parser-facing path: Getting Started -> Driver API -> Parser API -> AST Model -> Visitor Pattern
  3. contributor/debug path: Project Layout -> Testing -> Diagnostics And Printing -> Parser Boundaries

Getting Started

This chapter is the shortest path from real source or headers to something that PARC actually produces today: either a parsed AST or a SourcePackage.

Read parc as the source frontend of the toolchain:

  • parc owns preprocessing, parsing, extraction, and source diagnostics
  • linc owns link and binary evidence
  • gerc owns Rust lowering and emitted build output

The boundary rule is strict: parc/src/** must not depend on linc or gerc, and any cross-package translation belongs only in tests, examples, or external harnesses.

Add the crate

[dependencies]
parc = { path = "../parc" }

Pick the right API first

Use parc::driver when you have a file on disk and want PARC to run a system preprocessor first.

use parc::driver::{parse, Config};

fn main() -> Result<(), parc::driver::Error> {
    let config = Config::default();
    let parsed = parse(&config, "src/tests/files/minimal.c")?;

    println!("preprocessed bytes: {}", parsed.source.len());
    println!("top-level items: {}", parsed.unit.0.len());
    Ok(())
}

Use parc::parse when you already have source text in memory and want to parse a fragment directly.

use parc::driver::Flavor;
use parc::parse;

fn main() {
    let expr = parse::expression("a + b * 2", Flavor::StdC11).unwrap();
    println!("{:#?}", expr);
}

Choose a language flavor

PARC supports three parser modes:

FlavorMeaning
StdC11Strict C11
GnuC11C11 plus GNU syntax such as typeof, attributes, statement expressions, and GNU asm
ClangC11C11 plus Clang-oriented extensions such as availability attributes

For file-based parsing, Config::default() selects:

  • clang -E on macOS
  • gcc -E on other targets

You can also select explicitly:

#![allow(unused)]
fn main() {
use parc::driver::Config;

let gnu = Config::with_gcc();
let clang = Config::with_clang();
}

First useful parse example

This example parses a translation unit through the normal driver path:

use parc::driver::{parse, Config};

fn main() -> Result<(), parc::driver::Error> {
    let parsed = parse(&Config::default(), "src/tests/files/minimal.c")?;

    for (i, item) in parsed.unit.0.iter().enumerate() {
        println!("item #{i}: {:?}", item.node);
    }

    Ok(())
}

First useful scan example

If what you really want is source IR rather than a raw AST, start with parc::scan:

#![allow(unused)]
fn main() {
use parc::scan::{scan_headers, ScanConfig};

let config = ScanConfig::new().entry_header("demo.h");
let result = scan_headers(&config).unwrap();

println!("items: {}", result.package.items.len());
}

This is the closest thing PARC has to a “frontend product” API.

First fragment example

If you only need one declaration or statement, the direct parser API is faster to wire in:

use parc::driver::Flavor;
use parc::parse;

fn main() {
    let decl = parse::declaration("static const int answer = 42;", Flavor::StdC11).unwrap();
    let stmt = parse::statement("return answer;", Flavor::StdC11).unwrap();

    println!("{:#?}", decl);
    println!("{:#?}", stmt);
}

Architectural boundary

parc is the source frontend.

It owns:

  • preprocessing
  • parsing
  • source extraction
  • source diagnostics
  • the parc::ir::SourcePackage artifact

It does not own:

  • symbol inventory
  • binary validation
  • link planning
  • Rust code generation

In this repository, cross-package composition should not live in parc library code. linc and gerc should consume parc output only from tests, examples, or external harnesses.

Common Workflows

Most confusion with PARC comes from choosing the wrong entry point. This chapter maps common tasks to the right API.

Read the workflows in this order:

  1. prefer source/frontend workflows that stay inside parc
  2. serialize SourcePackage when another tool needs the result
  3. keep any cross-package translation in tests, examples, or external harnesses

Workflow selection

SituationAPI
Turn headers into SourcePackagescan::scan_headers
Parse a .c or .h file with includes and macrosdriver::parse
Parse already-preprocessed text from memorydriver::parse_preprocessed
Parse one expression, declaration, statement, or translation unit stringparse::*
Walk an AST you already parsedvisit
Print an AST for debuggingprint::Printer

Scan headers into source IR

Use this when your real target is the PARC source contract rather than the raw syntax tree.

#![allow(unused)]
fn main() {
use parc::scan::{scan_headers, ScanConfig};

let result = scan_headers(&ScanConfig::new().entry_header("demo.h")).unwrap();
println!("diagnostics: {}", result.package.diagnostics.len());
}

This is the best fit for downstream toolchains that want declarations, provenance, macros, and diagnostics in one package.

Parse a real file

Use this when your source depends on #include, #define, or compiler predefined macros.

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};

let config = Config::default();
let parsed = parse(&config, "src/main.c")?;
}

This gives you:

  • parsed.source: the preprocessed source text
  • parsed.unit: the AST root

Parse preprocessed text

Use this when another tool already ran preprocessing and you only want PARC to parse.

#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};

let config = Config::default();
let source = r#"
1 "generated.i"
typedef int count_t;
count_t answer(void) { return 42; }
"#
.to_string();

let parsed = parse_preprocessed(&config, source)?;
}

This is useful for:

  • snapshot-based tests
  • integration with custom build systems
  • reproducing parse bugs from stored .i files

Parse a fragment

Use parc::parse when you are not dealing with a whole file.

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let expr = parse::expression("ptr->len + 1", Flavor::GnuC11)?;
let decl = parse::declaration("unsigned long flags;", Flavor::StdC11)?;
let stmt = parse::statement("if (ok) return 1;", Flavor::StdC11)?;
}

This is the right choice for:

  • unit tests
  • parser experiments
  • editor tooling for partial snippets

Build an analyzer

The normal analyzer flow is:

  1. Parse with driver or parse
  2. Traverse with visit
  3. Use span and loc for diagnostics

Example outline:

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::{self, Visit};
use parc::{ast, span};

struct FunctionCounter {
    count: usize,
}

impl<'ast> Visit<'ast> for FunctionCounter {
    fn visit_function_definition(
        &mut self,
        node: &'ast ast::FunctionDefinition,
        span: &'ast span::Span,
    ) {
        self.count += 1;
        visit::visit_function_definition(self, node, span);
    }
}

let parsed = parse(&Config::default(), "src/main.c")?;
let mut counter = FunctionCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);
println!("functions: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}

Debug the parse tree

Use the printer when you need a human-readable structural dump:

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::print::Printer;
use parc::visit::Visit;

let parsed = parse(&Config::default(), "src/main.c")?;

let mut out = String::new();
Printer::new(&mut out).visit_translation_unit(&parsed.unit);
println!("{}", out);
Ok::<(), parc::driver::Error>(())
}

Rule of thumb

  • If you want SourcePackage, start with scan.
  • If preprocessing matters and you still want the AST, start with driver.
  • If you already have plain text in memory, start with parse.
  • If you need diagnostics tied back to original files, keep the preprocessed source string.
  • If another crate needs PARC output, stop at SourcePackage and translate it outside parc/src/**.

Driver API

The driver module is the high-level API for file parsing. It runs a system preprocessor, then parses the resulting text into a TranslationUnit.

Main types

#![allow(unused)]
fn main() {
pub struct Config {
    pub cpp_command: String,
    pub cpp_options: Vec<String>,
    pub flavor: Flavor,
}

pub enum Flavor {
    StdC11,
    GnuC11,
    ClangC11,
}

pub struct Parse {
    pub source: String,
    pub unit: TranslationUnit,
}
}

The return value matters:

  • source is the preprocessed source PARC actually parsed
  • unit is the AST root

Basic file parsing

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};

let config = Config::default();
let parsed = parse(&config, "examples/demo.c")?;

println!("preprocessed bytes: {}", parsed.source.len());
println!("top-level nodes: {}", parsed.unit.0.len());
Ok::<(), parc::driver::Error>(())
}

Configuring the preprocessor

You can override both the preprocessor executable and its arguments.

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Flavor};

let config = Config {
    cpp_command: "gcc".into(),
    cpp_options: vec![
        "-E".into(),
        "-Iinclude".into(),
        "-DMODE=2".into(),
        "-nostdinc".into(),
    ],
    flavor: Flavor::GnuC11,
};

let parsed = parse(&config, "src/input.c")?;
Ok::<(), parc::driver::Error>(())
}

This is the place to inject:

  • include directories with -I...
  • macro definitions with -D...
  • stricter or more isolated builds with -nostdinc

GCC vs Clang helpers

The convenience constructors also select parser flavor:

#![allow(unused)]
fn main() {
use parc::driver::Config;

let gcc = Config::with_gcc();     // gcc -E, GNU flavor
let clang = Config::with_clang(); // clang -E, Clang flavor
}

Use these when you want the parser flavor to match the syntax accepted by the external preprocessor.

Parsing preprocessed text directly

If you already have .i-style content, skip parse and call parse_preprocessed.

#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};

let source = r#"
1 "sample.i"
typedef int count_t;
count_t next(count_t x) { return x + 1; }
"#
.to_string();

let parsed = parse_preprocessed(&Config::default(), source)?;
println!("{}", parsed.unit.0.len());
Ok::<(), parc::driver::SyntaxError>(())
}

Error model

driver::parse returns:

#![allow(unused)]
fn main() {
Result<Parse, parc::driver::Error>
}

The error variants are:

  • PreprocessorError(io::Error) when the external preprocessor fails
  • SyntaxError(SyntaxError) when preprocessing succeeded but parsing failed

Working with syntax errors

SyntaxError includes:

  • source: the preprocessed source
  • line, column, offset: the parse failure position in that source
  • expected: a set of expected tokens

Example:

#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};

let broken = "int main( { return 0; }".to_string();
match parse_preprocessed(&Config::default(), broken) {
    Ok(_) => {}
    Err(err) => {
        eprintln!("parse failed at {}:{}", err.line, err.column);
        eprintln!("expected: {:?}", err.expected);
    }
}
}

If the preprocessed source contains line markers, SyntaxError::get_location() can reconstruct the original file and include stack.

Built-in preprocessor

PARC includes a built-in C preprocessor that eliminates the need for an external gcc or clang binary. Use parse_builtin instead of parse:

#![allow(unused)]
fn main() {
use parc::driver::{parse_builtin, Config};
use std::path::Path;

let config = Config::with_gcc();
let include_paths = vec![Path::new("/usr/include")];
let parsed = parse_builtin(&config, "src/input.c", &include_paths)?;
Ok::<(), parc::driver::Error>(())
}

The built-in preprocessor supports:

  • Object-like and function-like macros (with #, ##, __VA_ARGS__)
  • Conditional compilation (#if, #ifdef, #ifndef, #elif, #else, #endif)
  • #include resolution with configurable search paths
  • Include guard detection and optimization
  • defined() operator in #if expressions
  • Full C constant expression evaluation (arithmetic, bitwise, logical, ternary)
  • Predefined target macros (architecture, OS, GCC compatibility)

Macro extraction

To extract all #define macros from a C file (equivalent to gcc -dD -E):

#![allow(unused)]
fn main() {
use parc::driver::capture_macros;
use std::path::Path;

let macros = capture_macros("src/input.c", &[Path::new("/usr/include")])?;
for (name, value) in &macros {
    println!("#define {} {}", name, value);
}
Ok::<(), parc::driver::Error>(())
}

This returns all macros active after preprocessing, including predefined target macros and macros from included headers.

Practical advice

  • Keep parsed.source if you plan to report errors later.
  • Use parse_preprocessed for deterministic regression tests.
  • Prefer explicit cpp_options in tools and CI so parse behavior stays reproducible.
  • Use parse_builtin when you need zero-dependency parsing without a C toolchain.

Built-in Preprocessor

PARC includes a complete built-in C preprocessor in the parc::preprocess module. This eliminates the runtime dependency on gcc or clang for preprocessing.

Architecture

The preprocessor is split into focused modules:

ModulePurpose
tokenToken types (Ident, Number, Punct, etc.)
lexerPreprocessor tokenizer (§6.4 preprocessing tokens)
directiveDirective parser (#define, #if, #include, etc.)
macrosMacro table, object-like and function-like expansion
expr#if constant expression evaluator
processorConditional compilation engine
include#include resolution with search paths and guard tracking
predefinedTarget-specific predefined macros

Quick start

#![allow(unused)]
fn main() {
use parc::preprocess::preprocess;

let output = preprocess("#define X 42\nint a = X;\n");
// output.tokens contains the expanded token stream
}

Macro expansion

Both object-like and function-like macros are supported:

#define SIZE 1024
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define LOG(fmt, ...) printf(fmt, __VA_ARGS__)

Features:

  • # stringification operator
  • ## token pasting operator
  • __VA_ARGS__ for variadic macros
  • Recursive expansion with “paint set” to prevent infinite recursion (C standard §6.10.3.4)
  • Self-referential macros handled correctly (#define X X + 1 expands to X + 1)

Conditional compilation

All standard conditional directives are supported:

#if CONDITION
#ifdef NAME
#ifndef NAME
#elif CONDITION
#else
#endif

The #if expression evaluator supports:

  • Integer literals (decimal, octal, hex, binary)
  • Character constants ('x')
  • defined(NAME) and defined NAME
  • All C operators: arithmetic, bitwise, logical, comparison, ternary
  • Undefined identifiers evaluate to 0 (per C standard §6.10.1p4)

Include resolution

#![allow(unused)]
fn main() {
use parc::preprocess::{IncludeResolver, Processor};

let mut resolver = IncludeResolver::new();
resolver.add_system_path("/usr/include");
resolver.add_local_path("./include");

let mut processor = Processor::new();
let result = resolver.preprocess_file(
    std::path::Path::new("src/main.c"),
    &mut processor,
);
}

Features:

  • "local" includes search relative to the including file, then local paths
  • <system> includes search system paths only
  • Include guard detection (#ifndef X / #define X / ... / #endif)
  • File content caching
  • Maximum include depth (200) to prevent infinite recursion

Predefined macros

Target-specific macros are available for common platforms:

#![allow(unused)]
fn main() {
use parc::preprocess::{MacroTable, Target, define_target_macros};

let mut table = MacroTable::new();
define_target_macros(&mut table, &Target::host());
// Now table has __STDC__, __linux__, __x86_64__, __GNUC__, etc.
}

Supported targets:

  • Architectures: x86_64, aarch64, x86, arm
  • Operating systems: Linux, macOS (Darwin), Windows

Standard macros defined:

  • __STDC__, __STDC_VERSION__, __STDC_HOSTED__
  • Architecture-specific: __x86_64__, __aarch64__, __i386__, __arm__
  • OS-specific: __linux__, __APPLE__, _WIN32, etc.
  • GCC compatibility: __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__
  • Type sizes: __SIZEOF_POINTER__, __SIZEOF_INT__, etc.
  • Limits: __CHAR_BIT__, __INT_MAX__, __LONG_MAX__

Source IR

The parc::ir module defines the durable intermediate representation produced by the PARC frontend. It is the primary contract between the parser/extractor and downstream consumers (LINC, GERC).

Design Principles

  • Smaller than the AST: only normalized declarations, not the full syntax tree
  • Serializable: all types derive serde::Serialize and serde::Deserialize
  • Parser-agnostic: downstream consumers should depend on parc::ir, not parc::ast
  • No link/binary concerns: no ABI probing, no library paths, no symbol validation

Key Types

SourcePackage

The top-level container:

#![allow(unused)]
fn main() {
use parc::ir::SourcePackage;

let pkg = SourcePackage::new();
assert!(pkg.is_empty());
}

SourceType

Represents C types at source level:

Void, Bool, Char, SChar, UChar, Short, UShort,
Int, UInt, Long, ULong, LongLong, ULongLong,
Float, Double, LongDouble, Int128, UInt128,
Pointer, Array, Qualified, FunctionPointer,
TypedefRef, RecordRef, EnumRef, Opaque

SourceItem

One extracted declaration:

  • Function — function declaration with name, parameters, return type, calling convention
  • Record — struct/union with optional fields
  • Enum — enum with named variants and optional values
  • TypeAlias — typedef declaration
  • Variable — extern variable declaration
  • Unsupported — placeholder for unrepresentable declarations

SourceMacro

Captured preprocessor macro with form (object-like/function-like), kind, and optional parsed value.

SourceDiagnostic

Frontend diagnostic with kind, severity, message, optional location, and optional item name.

Provenance

  • SourceOrigin — where a declaration came from (Entry, UserInclude, System, Unknown)
  • DeclarationProvenance — per-item provenance metadata
  • MacroProvenance — per-macro provenance metadata
  • SourceTarget — compiler/target identity
  • SourceInputs — entry headers, include dirs, defines

JSON Serialization

All IR types support JSON roundtrip:

#![allow(unused)]
fn main() {
use parc::ir::SourcePackage;

let pkg = SourcePackage::new();
let json = serde_json::to_string_pretty(&pkg).unwrap();
let back: SourcePackage = serde_json::from_str(&json).unwrap();
assert_eq!(pkg, back);
}

Querying

SourcePackage provides typed accessors:

#![allow(unused)]
fn main() {
// pkg.functions()      -> Iterator<Item = &SourceFunction>
// pkg.records()        -> Iterator<Item = &SourceRecord>
// pkg.enums()          -> Iterator<Item = &SourceEnum>
// pkg.type_aliases()   -> Iterator<Item = &SourceTypeAlias>
// pkg.variables()      -> Iterator<Item = &SourceVariable>
// pkg.unsupported_items() -> ...
// pkg.find_function("malloc")
// pkg.find_record("point")
// pkg.find_enum("color")
// pkg.find_type_alias("size_t")
// pkg.find_variable("errno")
}

Extraction

The parc::extract module converts a parsed C AST into the normalized SourcePackage IR. It handles all declaration families.

Quick Start

#![allow(unused)]
fn main() {
use parc::extract;

let source = r#"
    typedef unsigned long size_t;
    void *malloc(size_t size);
    struct point { int x; int y; };
"#;

let pkg = extract::extract_from_source(source).unwrap();
assert_eq!(pkg.function_count(), 1);
assert_eq!(pkg.record_count(), 1);
assert_eq!(pkg.type_alias_count(), 1);
}

API Functions

extract_from_source

Parse and extract in one step using GNU C11 flavor:

#![allow(unused)]
fn main() {
let pkg = parc::extract::extract_from_source("int foo(void);").unwrap();
}

parse_and_extract

Parse and extract with a specific flavor:

#![allow(unused)]
fn main() {
let pkg = parc::extract::parse_and_extract(
    "int foo(void);",
    parc::driver::Flavor::StdC11,
).unwrap();
}

extract_from_translation_unit

Extract from an already-parsed AST:

#![allow(unused)]
fn main() {
let unit = parc::parse::translation_unit("int foo(void);", parc::driver::Flavor::StdC11).unwrap();
let pkg = parc::extract::extract_from_translation_unit(&unit, Some("test.h".into()));
}

parse_and_extract_resilient

Parse with error recovery and extract what’s possible:

#![allow(unused)]
fn main() {
let pkg = parc::extract::parse_and_extract_resilient(
    "int valid;\n@@@bad@@@;\nint also_valid;",
    parc::driver::Flavor::StdC11,
);
}

extract_file

Read a file from disk and extract:

#![allow(unused)]
fn main() {
let pkg = parc::extract::extract_file("path/to/header.h", parc::driver::Flavor::GnuC11).unwrap();
assert!(pkg.source_path.is_some());
}

What Gets Extracted

C DeclarationSource Item
typedef int T;SourceTypeAlias
int foo(void);SourceFunction
int foo(void) { ... }SourceFunction (body ignored)
struct S { int x; };SourceRecord
struct S;SourceRecord (opaque)
union U { ... };SourceRecord (Union kind)
enum E { A, B };SourceEnum
extern int x;SourceVariable
static int f() {}Diagnostic (not bindable)
_Static_assert(...)Diagnostic

Diagnostics

The extractor produces diagnostics for constructs it cannot fully represent:

  • Bitfield widths (partial representation)
  • Inline/noreturn specifiers (ignored)
  • Calling convention attributes (captured on function, other attributes warned)
  • K&R function declarations (unsupported)
  • Block pointers (unsupported)
  • Static functions (not bindable)

Header Scanning

parc::scan is the highest-level PARC API for people who want the source contract, not just the AST. It preprocesses headers, parses them, extracts items, and returns a SourcePackage plus the preprocessed source text.

Quick Start

#![allow(unused)]
fn main() {
use parc::scan::{ScanConfig, scan_headers};

let config = ScanConfig::new()
    .entry_header("api.h")
    .include_dir("/usr/include")
    .define_flag("NDEBUG")
    .with_builtin_preprocessor();

let result = scan_headers(&config).unwrap();
let pkg = result.package;
}

What scan really owns

The scan path currently owns all of these steps:

  1. choose builtin or external preprocessing
  2. build the preprocessing environment
  3. parse the preprocessed translation unit
  4. extract declarations into parc::ir
  5. attach input metadata and diagnostics
  6. optionally resolve typedef chains in the produced package

That makes it the closest thing PARC has to a “source artifact producer”.

ScanConfig

Builder for scan configuration:

MethodDescription
entry_header(path)Add an entry-point header
include_dir(path)Add a preprocessor include search path
define(name, value)Add a preprocessor define with value
define_flag(name)Add a flag-style define (no value)
with_compiler(cmd)Set the external preprocessor command
with_flavor(flavor)Set the parser flavor
with_builtin_preprocessor()Use the built-in preprocessor

Preprocessing Modes

External (default)

Uses gcc -E or clang -E to preprocess headers. Requires the compiler to be installed. Supports all system headers.

Built-in

Uses parc::preprocess directly. This is useful for controlled fixtures and repo-local tests. It is not a promise that the built-in preprocessor already matches every hostile system-header stack.

ScanResult

The scan produces:

  • package: SourcePackage — the extracted declarations and metadata
  • preprocessed_source: String — the preprocessed source text

Intake

For already-preprocessed source (e.g., output of gcc -E), use parc::intake::PreprocessedInput:

#![allow(unused)]
fn main() {
use parc::intake::PreprocessedInput;

let input = PreprocessedInput::from_string("int foo(void);")
    .with_path("output.i")
    .with_flavor(parc::driver::Flavor::GnuC11);

let pkg = input.extract();
}

What to expect from failures

scan_headers() can fail early on preprocessing setup problems, and it can also return a package with parse diagnostics if preprocessing succeeded but the source could not be fully parsed.

That split is intentional:

  • operational setup failures are Err(...)
  • source-level failures become package.diagnostics when possible

Parser API

The parse module exposes direct parsing functions that work on in-memory strings. Unlike driver, it does not invoke an external preprocessor.

Available entry points

#![allow(unused)]
fn main() {
parse::constant(source, flavor)
parse::expression(source, flavor)
parse::declaration(source, flavor)
parse::statement(source, flavor)
parse::translation_unit(source, flavor)
}

These map to progressively larger grammar fragments.

Return types

The direct parser returns the same ParseResult<T> shape for every entry point:

#![allow(unused)]
fn main() {
type ParseResult<T> = Result<T, ParseError>;
}

ParseError contains:

  • line
  • column
  • offset
  • expected

That makes it well suited for parser tests and editor integrations.

Parse an expression

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let expr = parse::expression("value + 1 * scale", Flavor::StdC11)?;
println!("{:#?}", expr);
Ok::<(), parc::parse::ParseError>(())
}

The return type is Box<Node<Expression>>, so you get both the expression and its span.

Parse a declaration

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let decl = parse::declaration(
    "static const unsigned long mask = 0xff;",
    Flavor::StdC11,
)?;

println!("{:#?}", decl.node);
Ok::<(), parc::parse::ParseError>(())
}

Declarations are useful when you want to inspect:

  • storage class
  • type qualifiers
  • declarator structure
  • initializers

Parse a statement

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let stmt = parse::statement(
    "for (int i = 0; i < 4; i++) total += i;",
    Flavor::StdC11,
)?;

println!("{:#?}", stmt.node);
Ok::<(), parc::parse::ParseError>(())
}

Parse a whole translation unit

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let source = r#"
typedef int count_t;
count_t inc(count_t x) { return x + 1; }
"#;

let unit = parse::translation_unit(source, Flavor::StdC11)?;
println!("items: {}", unit.0.len());
Ok::<(), parc::parse::ParseError>(())
}

Flavor-sensitive parsing

GNU or Clang syntax only parses when you select a compatible flavor.

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let gnu_expr = "({ int x = 1; x + 2; })";
assert!(parse::expression(gnu_expr, Flavor::GnuC11).is_ok());
assert!(parse::expression(gnu_expr, Flavor::StdC11).is_err());
}

When to prefer parse

Use parse when:

  • you already have a string in memory
  • you are testing grammar behavior directly
  • you are parsing snippets, not full files
  • you want a deterministic input without shelling out to gcc or clang

Use driver instead when preprocessing is part of the problem.

AST Model

The ast module contains the syntax tree PARC produces after parsing. Most types track the C11 grammar closely, with additional variants for supported GNU and Clang extensions.

Core wrapper types

Many parsed values are wrapped in Node<T>:

#![allow(unused)]
fn main() {
pub struct Node<T> {
    pub node: T,
    pub span: Span,
}
}

That means most interesting values come with a byte range in the parsed source.

Top-level structure

The root is:

#![allow(unused)]
fn main() {
pub struct TranslationUnit(pub Vec<Node<ExternalDeclaration>>);
}

Top-level items are:

#![allow(unused)]
fn main() {
pub enum ExternalDeclaration {
    Declaration(Node<Declaration>),
    StaticAssert(Node<StaticAssert>),
    FunctionDefinition(Node<FunctionDefinition>),
}
}

So a translation unit is a flat list of:

  • declarations
  • static assertions
  • function definitions

Declarations

Declarations are split into specifiers and declarators:

#![allow(unused)]
fn main() {
pub struct Declaration {
    pub specifiers: Vec<Node<DeclarationSpecifier>>,
    pub declarators: Vec<Node<InitDeclarator>>,
}
}

This mirrors C’s real syntax. For example:

static const unsigned long value = 42;

roughly becomes:

  • storage class specifier: Static
  • type qualifier: Const
  • type specifiers: Unsigned, Long
  • one declarator with identifier value
  • initializer expression 42

Declarators are the hard part

Declarator separates:

  • the name-bearing core (DeclaratorKind)
  • derived layers such as pointers, arrays, and functions
  • extension nodes

That design lets PARC represent C declarators without flattening away their structure.

Examples:

  • int *p;
  • int values[16];
  • int (*handler)(int);
  • void f(int x, int y);

Expressions

Expression is a large enum covering C expression syntax:

  • identifiers
  • constants
  • string literals
  • member access
  • calls
  • casts
  • unary operators
  • binary operators
  • conditional expressions
  • comma expressions
  • sizeof, _Alignof
  • GNU statement expressions
  • offsetof and va_arg expansions

Examples:

x
42
ptr->field
f(a, b)
(int) value
a + b * c
cond ? left : right
({ int t = 1; t + 2; })

Statements

Statement covers:

  • labeled statements
  • compound blocks
  • expression statements
  • if
  • switch
  • while
  • do while
  • for
  • goto
  • continue
  • break
  • return
  • GNU asm statements

Blocks contain BlockItem, which can be:

  • a declaration
  • a static assertion
  • another statement

That means a compound statement preserves the declaration/statement distinction instead of erasing everything into one generic node list.

Types and declarator support

Important declaration-side types include:

  • TypeSpecifier
  • TypeQualifier
  • StorageClassSpecifier
  • FunctionSpecifier
  • AlignmentSpecifier
  • TypeName
  • DerivedDeclarator
  • ParameterDeclaration
  • Initializer
  • Designator

This is enough to model:

  • pointer chains
  • arrays and VLA-like forms
  • function parameter lists
  • designated initializers
  • anonymous and named structs/unions/enums
  • typedef names
  • typeof

Extension nodes

PARC includes explicit AST nodes for extensions instead of hiding them:

  • Extension::Attribute
  • Extension::AsmLabel
  • Extension::AvailabilityAttribute
  • TypeSpecifier::TypeOf
  • Statement::Asm
  • Expression::Statement

That makes it practical to write tools that either support or reject extension syntax intentionally.

Reading the AST effectively

When working with PARC, a useful order is:

  1. Start at TranslationUnit
  2. Split declarations from function definitions
  3. Inspect declarators carefully for type shape
  4. Use the visitor API instead of hand-recursing everywhere
  5. Use Printer to learn unfamiliar subtrees

Visitor Pattern

The visit module provides recursive AST traversal. It exposes:

  • a Visit<'ast> trait with hook methods
  • free functions like visit_expression and visit_function_definition that recurse into children

The important rule

When you override a method, call the free function from parc::visit, not the trait method on self. Calling self.visit_* from inside the override will recurse back into your override.

Count function definitions

#![allow(unused)]
fn main() {
use parc::{ast, span, visit};
use parc::visit::Visit;

struct FunctionCounter {
    count: usize,
}

impl<'ast> Visit<'ast> for FunctionCounter {
    fn visit_function_definition(
        &mut self,
        node: &'ast ast::FunctionDefinition,
        span: &'ast span::Span,
    ) {
        self.count += 1;
        visit::visit_function_definition(self, node, span);
    }
}
}

Collect identifiers from expressions

#![allow(unused)]
fn main() {
use parc::{ast, span, visit};
use parc::visit::Visit;

struct IdentifierCollector {
    names: Vec<String>,
}

impl<'ast> Visit<'ast> for IdentifierCollector {
    fn visit_identifier(&mut self, node: &'ast ast::Identifier, span: &'ast span::Span) {
        self.names.push(node.name.clone());
        visit::visit_identifier(self, node, span);
    }
}
}

Use the visitor

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::Visit;

let parsed = parse(&Config::default(), "examples/sample.c")?;

let mut counter = FunctionCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);

println!("functions: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}

When to override which method

  • Override visit_translation_unit for whole-file summaries
  • Override visit_function_definition for function-level analysis
  • Override visit_declaration for declaration inspection
  • Override visit_expression for expression-wide checks
  • Override narrow hooks like visit_call_expression when you only care about one form

Traversal style

Two common styles work well:

Pre-order

Do work before recursing:

#![allow(unused)]
fn main() {
fn visit_expression(&mut self, node: &'ast ast::Expression, span: &'ast span::Span) {
    self.seen += 1;
    visit::visit_expression(self, node, span);
}
}

Selective traversal

Only recurse when the node passes a filter:

#![allow(unused)]
fn main() {
fn visit_statement(&mut self, node: &'ast ast::Statement, span: &'ast span::Span) {
    if matches!(node, ast::Statement::Return(_)) {
        self.returns += 1;
    }
    visit::visit_statement(self, node, span);
}
}

Practical advice

  • Start with a broad hook like visit_expression while learning the tree.
  • Narrow to specific hooks once you understand the shapes you care about.
  • Pair the visitor with Printer when a subtree is unclear.

Location Tracking

PARC tracks source positions in two related ways:

  • Span stores byte offsets into the parsed input
  • loc maps byte offsets in preprocessed source back to original files and lines

Span

Span is a byte range:

#![allow(unused)]
fn main() {
pub struct Span {
    pub start: usize,
    pub end: usize,
}
}

Most AST values are wrapped in Node<T>, which adds a span field:

#![allow(unused)]
fn main() {
pub struct Node<T> {
    pub node: T,
    pub span: Span,
}
}

What spans point to

This depends on the API you used:

  • with parse::*, spans refer to the string you passed in
  • with driver::parse_preprocessed, spans refer to the preprocessed string you passed in
  • with driver::parse, spans refer to the preprocessor output stored in Parse::source

That last case is important: spans do not directly point into the original .c file when preprocessing has inserted line markers or expanded includes.

Mapping offsets back to files

The loc module reads preprocessor line markers like:

# 42 "include/header.h" 1

From those markers, get_location_for_offset reconstructs:

  • the active file
  • the active line number
  • the include stack

Basic example

#![allow(unused)]
fn main() {
use parc::loc::get_location_for_offset;

let src = "# 1 \"main.c\"\nint value;\n";
let (loc, includes) = get_location_for_offset(src, 18);

assert_eq!(loc.file, "main.c");
assert!(includes.is_empty());
}

Using spans with locations

The common pattern is:

  1. take a node span
  2. use span.start or span.end
  3. map that offset through loc

Example:

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::loc::get_location_for_offset;

let parsed = parse(&Config::default(), "examples/sample.c")?;

if let Some(first) = parsed.unit.0.first() {
    let (loc, include_stack) = get_location_for_offset(&parsed.source, first.span.start);
    println!("first item starts in {}:{}", loc.file, loc.line);
    println!("include depth: {}", include_stack.len());
}
Ok::<(), parc::driver::Error>(())
}

SyntaxError::get_location

For parser failures in the driver path, driver::SyntaxError already exposes:

#![allow(unused)]
fn main() {
err.get_location()
}

That returns:

  • the active source location
  • the include chain that led there

This is the best starting point for user-facing diagnostics.

Caveat: byte offsets, not columns in UTF-16

PARC stores Rust byte offsets. That is usually what you want for source processing, but if you are feeding results into another tool that expects a different coordinate system, convert explicitly.

Testing

parc is the source-meaning crate in the toolchain, so its tests should prove three things:

  • the frontend accepts or rejects source as intended
  • the extracted SourcePackage contract carries the intended meaning
  • cross-package composition can start from parc artifacts without relying on parc internals

PARC has two broad testing layers:

  • direct parser/API tests in src/tests
  • corpus-style fixtures under test/reftests/ and, when present, test/full_apps/

It also now has explicit grouped failure suites:

  • failure_matrix_preprocess for scan/preprocessor hard failures and conservative scan outcomes
  • failure_matrix_source for source-parse hard failures, resilient recovery, and diagnostic-preserving extraction

Basic commands

The repository Makefile wraps the normal Cargo flow:

make build
make test

Those run:

  • cargo build --release
  • cargo test

Hermeticity split

The large PARC surfaces should be read in three groups:

  • always-on hermetic baselines
  • host-dependent but high-value ladders
  • hostile or conservative-failure surfaces

The hermetic baselines should remain the default confidence floor. The host-dependent ladders should strengthen confidence when available. The failure surfaces should prove that PARC stays diagnostic and deterministic when it cannot fully model a header family yet.

Contract tests

Contract tests are the tests a downstream toolchain should treat as the main statement of support:

  • parse_api tests for direct parser entry points
  • extraction tests for declaration/source modeling
  • scan tests for preprocessing and multi-file source intake
  • consumability tests for the SourcePackage artifact

If one of those changes meaningfully, the corresponding book chapter should change in the same patch.

Parse API tests

src/tests/parse_api.rs checks the public parse entry points directly.

Examples covered in the repository include:

  • constants
  • expressions
  • declarations
  • statements
  • translation units

This layer is useful when:

  • adding a new public parser entry point
  • fixing a small grammar regression
  • documenting a minimal parsing example

Reference tests

The reftest harness in src/tests/reftests.rs reads files from test/reftests/. Each case stores:

  • the source snippet
  • optional #pragma directives that affect parsing
  • an expected AST printout between /*=== and ===*/

That means reftests verify both:

  • whether parsing succeeds
  • whether the produced tree matches the expected printer output

Reftest update workflow

The harness supports TEST_UPDATE=1 to rewrite expected outputs when printer changes are intentional.

TEST_UPDATE=1 cargo test reftests

Use that carefully. It is appropriate after deliberate AST or printer changes, not as a substitute for reviewing diffs.

Full-app fixtures

The repository includes a full-app harness in src/tests/full_apps.rs. It supports fixture directories with a fixture.toml manifest describing:

  • mode
  • flavor
  • entry
  • expected
  • include_dirs
  • allow_system_includes
  • tags

Supported modes are:

  • translation_unit
  • driver
  • preprocessed

This is the right layer for:

  • multi-file examples
  • include-path behavior
  • external fixture snapshots
  • deterministic .i inputs

Filtering larger fixture runs

The full-app runner supports environment filters:

FULL_APP_FILTER=musl/stdint make test
FULL_APP_TAG=synthetic make test

These are useful when debugging one fixture family instead of running the whole corpus.

Current workspace note

The test harness and README describe test/full_apps, but that directory is not present in this workspace snapshot. The book documents the supported format because the code and README do.

Extraction tests

src/tests/extraction_fixtures.rs contains fixture-based tests for the extraction pipeline: typical C patterns (stdio-style, nested structs, typedef chains, function pointers, etc.).

src/extract/mod.rs also contains unit tests for each declaration family.

Hostile header tests

src/tests/hostile_headers.rs covers edge-case and historically problematic C declarations: deep pointer nesting, anonymous structs/enums, specifier ordering variations, bitfield-only structs, extreme enum values, forward-then-define patterns, etc.

Recovery tests

src/tests/recovery.rs tests graceful handling of broken, incomplete, or unusual input. Uses both strict parsing (error expected) and resilient parsing (recovery expected).

Contract tests

src/tests/contract.rs and src/tests/consumability.rs verify that the SourcePackage contract is sufficient for downstream consumers. These tests cover iteration patterns, type navigation, serialization, filtering, merging, and programmatic construction.

Differential tests

src/tests/differential.rs documents the known differences between parc extraction and bic extraction, ensuring behavioral equivalence on standard declarations and explicitly documenting intentional divergences (pointer model, no ABI fields, typedef chain preservation).

Multi-file scan tests

src/tests/scan_multifile.rs covers multi-header scanning scenarios: include chains, multiple entry headers, cross-file struct references, conditional compilation, include guards, include directory resolution, and metadata population.

Adding new tests

A practical progression is:

  1. Add a parse_api unit test for the exact regression
  2. Add a reftest if you need a stable printed-tree expectation
  3. Add an extraction test if the issue is about declaration modeling
  4. Add a scan test if preprocessing or multi-file behavior matters
  5. Add a full-app fixture if the case needs a full filesystem layout

Cross-crate integration proof

parc library tests should not import linc or gerc.

Cross-crate proof belongs in:

  • linc tests/examples that ingest serialized or translated parc artifacts
  • gerc tests/examples that ingest translated source artifacts
  • external harnesses that exercise the full toolchain

That keeps parc’s own test suite focused on source meaning while still proving the larger pipeline elsewhere.

What “supported” means

For parc, support means:

  • the syntax path is covered by parser-facing tests
  • the extracted source meaning is covered by SourcePackage-level tests
  • the relevant limitations are documented honestly when behavior is partial or conservative

It does not mean:

  • every downstream consumer will accept the artifact unchanged
  • every hostile system header already has perfect preprocessing coverage
  • every parser-internal helper is part of the public contract

Diagnostics And Printing

PARC includes two pieces that are especially useful when building tools on top of the parser:

  • detailed parse errors
  • a tree printer for AST inspection

Direct parser diagnostics

The parse module returns ParseError:

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

match parse::expression("a +", Flavor::StdC11) {
    Ok(_) => {}
    Err(err) => {
        eprintln!("line: {}", err.line);
        eprintln!("column: {}", err.column);
        eprintln!("offset: {}", err.offset);
        eprintln!("expected: {:?}", err.expected);
    }
}
}

This is enough for:

  • editor error messages
  • parser regression tests
  • grammar debugging

Driver diagnostics

The driver adds preprocessor context on top:

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Error};

match parse(&Config::default(), "broken.c") {
    Ok(_) => {}
    Err(Error::PreprocessorError(err)) => {
        eprintln!("preprocessor failed: {}", err);
    }
    Err(Error::SyntaxError(err)) => {
        let (loc, includes) = err.get_location();
        eprintln!("syntax error in {}:{}:", loc.file, loc.line);
        eprintln!("column in preprocessed source: {}", err.column);
        for include in includes {
            eprintln!("included from {}:{}", include.file, include.line);
        }
    }
}
}

Formatting expected tokens

driver::SyntaxError also has format_expected, which is useful when building a custom human-readable error message.

AST printing

print::Printer is a visitor that renders the tree as an indented text dump.

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::print::Printer;
use parc::visit::Visit;

let parsed = parse(&Config::default(), "examples/sample.c")?;

let mut out = String::new();
Printer::new(&mut out).visit_translation_unit(&parsed.unit);
println!("{}", out);
Ok::<(), parc::driver::Error>(())
}

The printer is ideal when:

  • learning how PARC models a syntax form
  • updating reftests
  • debugging traversal code

A practical debugging loop

When a new syntax form is not behaving the way you expect:

  1. Parse the smallest reproducer with parse::*
  2. Print the AST with Printer
  3. Inspect spans on the nodes you care about
  4. Switch to driver if preprocessing is involved
  5. Map spans back to original files with loc

Project Layout

This chapter is for contributors and advanced users who want to understand where the parser logic lives.

Top-level crate layout

The repository is organized around a small public API surface and several internal support modules.

PathPurpose
src/lib.rsPublic module exports
src/ir/Source-level IR (SourcePackage, SourceType, etc.)
src/extract/Declaration extraction from AST to IR
src/scan/Header scanning (preprocess + parse + extract)
src/intake/Preprocessed source intake
src/driver.rsFile-based parsing via external preprocessing
src/preprocess/Built-in C preprocessor
src/parse.rsDirect fragment parsing API
src/ast/AST type definitions
src/visit/Recursive visitor functions and trait
src/parser/Parser implementation split by grammar area
src/loc.rsPreprocessor line-marker location mapping
src/span.rsSpan and Node<T> wrappers
src/print.rsAST debug printer
src/tests/Test harnesses and integration-style tests

AST and visitor organization

The AST is split into focused files:

  • src/ast/declarations.rs
  • src/ast/expressions.rs
  • src/ast/statements.rs
  • src/ast/extensions.rs
  • src/ast/lexical.rs

The visitor layer mirrors that structure in src/visit/.

That symmetry is useful:

  • if you add a new AST node, you usually need a matching visitor hook
  • if you are looking for traversal behavior, the corresponding file is easy to find

Parser organization

The parser implementation is divided by grammar topics instead of one giant file. Examples include:

  • translation_units_and_functions.rs
  • declarations_entry.rs
  • declarators.rs
  • statements_iteration_and_jump.rs
  • casts_and_binary.rs
  • typeof_and_ts18661.rs

That split makes grammar work more localized.

Internal environment handling

Parsing depends on Env, which tracks parser state such as known typedef names and enabled syntax flavor. The public parse and driver APIs construct the right environment for you.

This matters because some C parses depend on whether an identifier is currently known as a typedef.

Testing layout

src/tests/ contains:

  • API tests
  • reftest harnesses
  • larger fixture harnesses
  • external/system-header related coverage

When changing parser behavior, expect to touch both narrow tests and corpus-style fixtures.

Contributor workflow

A good change sequence is:

  1. reproduce with the smallest possible parse::* input
  2. add or update a focused test
  3. inspect the tree with Printer
  4. patch the grammar or AST logic
  5. run make test

Why the parser is split this way

The parser is organized by syntax areas because C grammar work tends to be local but not trivial. That split helps with three things:

  • keeping grammar changes reviewable
  • matching failures to the right part of the parser quickly
  • reducing the chance that one large parser file becomes impossible to maintain

For example:

  • declaration bugs often land in declarations_entry.rs, declarators.rs, or related files
  • expression bugs often land in primary_and_generic.rs, casts_and_binary.rs, or nearby files
  • statement bugs often land in the statements_* files

Public versus internal boundaries

These are normal consumer-facing modules:

  • ir (primary data contract)
  • extract
  • scan
  • intake
  • driver
  • preprocess
  • parse
  • ast
  • visit
  • loc
  • span
  • print

These are implementation-oriented and should not be treated as a stable downstream boundary:

  • parser
  • env
  • astutil
  • strings

That distinction matters when you are extending the book or the crate API. Documentation should prefer the consumer-facing modules unless the chapter is specifically contributor-oriented.

API Contract

This chapter records the intended public consumer surface of parc.

It is not a blanket promise about every future change. It is the current guidance for how downstream tools should integrate with the crate without depending on parser internals or accidentally turning parc into a shared ABI owner for the rest of the pipeline.

First Principle

parc is the source-meaning layer of the pipeline: preprocessing, parsing, and source-level semantic extraction.

The intended downstream pattern is:

  1. scan headers or parse source via driver, scan, or parse
  2. extract normalized declarations via extract
  3. consume the SourcePackage IR from ir
  4. use visit, span, and loc to analyze AST-level details if needed

Downstream consumers that want source contracts should depend on parc::ir, not on parc::ast directly.

More importantly for this repository:

  • parc library code must not depend on linc or gerc
  • linc and gerc should not require parc as a library dependency in their production code paths
  • integration should happen through PARC-owned artifacts in tests/examples or external harnesses
  • there is no shared ABI crate that all three libraries depend on
  • there is no obligation to preserve discarded pipeline shapes for backward compatibility

Preferred public surface

These are the main consumer-facing modules:

ModuleRoleCurrent expectation
parc::irsource-level IR (SourcePackage)preferred data contract
parc::extractdeclaration extraction from ASTpreferred extraction entry point
parc::scanheader scanning (preprocess + extract)preferred high-level entry point
parc::intakepreprocessed source intakepreferred for already-preprocessed source
parc::driverparse files and preprocessed sourcepreferred parse entry point
parc::preprocessbuilt-in C preprocessorpreferred preprocessing entry point
parc::parseparse string fragments directlypreferred low-level entry point
parc::asttyped syntax treeinternal data model
parc::visitrecursive traversal hookspreferred traversal API
parc::spanbyte-range metadatapreferred location primitive
parc::locmap offsets back to files/linespreferred diagnostics helper
parc::printAST debug dumpingpreferred inspection helper

Internal modules are not the contract

These modules are public only indirectly through behavior, not as a recommended downstream surface:

  • parser
  • env
  • astutil
  • strings

If a downstream tool depends directly on how those modules work, it is probably coupling itself to implementation details rather than the intended library boundary.

Normative consumer rules

If you are building on top of parc, the safest current rules are:

  1. use driver when preprocessing matters
  2. use parse::* for fragment parsing or already-controlled text inputs
  3. treat ir::SourcePackage as the primary output contract
  4. use visit for traversal instead of hand-rolling recursive descent everywhere
  5. use span and loc for diagnostics rather than guessing source positions
  6. do not rely on exact error-message strings for durable control flow
  7. do not treat PARC as semantic analysis, type checking, or ABI proof
  8. if another crate needs PARC output, serialize the PARC-owned artifact and translate it outside library code

What is part of the practical contract

Today the strongest practical contract is:

  • ir::SourcePackage, SourceType, SourceItem, and all IR types — the primary data contract
  • extract::extract_from_source, extract_from_translation_unit, parse_and_extract, parse_and_extract_resilient
  • scan::ScanConfig, scan_headers, ScanResult
  • intake::PreprocessedInput
  • ir::SourcePackageBuilder — programmatic package construction
  • driver::Config, Flavor, Parse, Error, SyntaxError, parse_builtin, and capture_macros
  • preprocess::{Processor, IncludeResolver, MacroTable, Lexer, preprocess, tokens_to_text, Target, define_target_macros}
  • parse::{constant, expression, declaration, statement, translation_unit, translation_unit_resilient}
  • the AST model under ast
  • the traversal hooks under visit
  • the span/location model under span and loc

Those are the surfaces the rest of the book assumes consumers will use.

The important correction is this: PARC has two practical contracts today, not one:

  1. a source-contract path centered on ir, extract, and scan
  2. a parser-facing path centered on driver, parse, ast, and visit

The docs should not pretend the AST side does not exist, because the crate very much exposes it.

What is intentionally weaker

The following should be treated as less stable than the core parsing surface:

  • exact debug formatting of AST values
  • exact Display wording of parse errors
  • internal parser file layout under src/parser/
  • incidental ordering of implementation helper functions

These details are useful for debugging and contribution work, but they are not the main consumer contract.

Explicit non-goals

The current contract does not promise:

  • semantic name resolution beyond parsing decisions such as typedef handling
  • type checking
  • ABI compatibility guarantees
  • full support for every GCC or Clang extension
  • preservation of raw macro definitions beyond what capture_macros provides

Those are outside the scope of PARC as a source frontend.

Downstream posture

For long-lived integrations, the safest posture is:

  1. use scan or extract as your primary entry point — these produce SourcePackage
  2. consume ir::SourcePackage rather than raw AST types where possible
  3. use driver and parse only when you need AST-level access
  4. treat unsupported syntax and parser errors as normal outcomes
  5. keep tests with representative preprocessed inputs for the syntax families you depend on
  6. keep cross-package translation in tests/examples/harnesses rather than adding library dependencies
  7. see Migration From bic if you are transitioning from bic

End-To-End Workflows

This chapter ties the public modules together into practical usage patterns.

Workflow 1: Parse A Real C File

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};

let parsed = parse(&Config::default(), "include/demo.h")?;
println!("items: {}", parsed.unit.0.len());
Ok::<(), parc::driver::Error>(())
}

This is the baseline path when:

  • includes matter
  • macros matter
  • compiler predefined types or macros matter

The result gives you both the AST and the exact preprocessed source PARC saw.

Workflow 2: Parse A Preprocessed Snapshot

#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};

let source = std::fs::read_to_string("snapshots/demo.i").unwrap();
let parsed = parse_preprocessed(&Config::default(), source)?;
Ok::<(), parc::driver::SyntaxError>(())
}

Use this when:

  • reproducing a parse bug
  • building deterministic tests
  • integrating with a nonstandard build system

This workflow isolates parser behavior from preprocessor invocation behavior.

Workflow 3: Parse A Fragment In Tests

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let decl = parse::declaration("typedef unsigned long word_t;", Flavor::StdC11)?;
let expr = parse::expression("ptr->field + 1", Flavor::GnuC11)?;
Ok::<(), parc::parse::ParseError>(())
}

This is the right workflow for:

  • unit tests
  • grammar debugging
  • editor or language-server experiments

Workflow 4: Build A Syntax Analyzer

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::{self, Visit};
use parc::{ast, span};

struct ReturnCounter {
    count: usize,
}

impl<'ast> Visit<'ast> for ReturnCounter {
    fn visit_statement(&mut self, node: &'ast ast::Statement, span: &'ast span::Span) {
        if matches!(node, ast::Statement::Return(_)) {
            self.count += 1;
        }
        visit::visit_statement(self, node, span);
    }
}

let parsed = parse(&Config::default(), "src/main.c")?;
let mut counter = ReturnCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);
println!("return statements: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}

This is the normal PARC analyzer pattern:

  1. parse
  2. traverse
  3. inspect spans and locations
  4. emit your own diagnostics or analysis data

Workflow 5: Build Diagnostics With Real File Locations

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::loc::get_location_for_offset;

let parsed = parse(&Config::default(), "src/main.c")?;

for item in &parsed.unit.0 {
    let (loc, _) = get_location_for_offset(&parsed.source, item.span.start);
    println!("top-level item starts at {}:{}", loc.file, loc.line);
}
Ok::<(), parc::driver::Error>(())
}

Use this when your users care about original file locations rather than raw byte offsets in the preprocessed stream.

Workflow 6: Debug A New Syntax Form

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
use parc::print::Printer;
use parc::visit::Visit;

let expr = parse::expression("({ int x = 1; x + 1; })", Flavor::GnuC11)?;

let mut out = String::new();
Printer::new(&mut out).visit_expression(&expr.node, &expr.span);
println!("{}", out);
Ok::<(), parc::parse::ParseError>(())
}

This is the most effective loop when exploring unfamiliar AST shapes.

Workflow 7: Regression-Test A Parse Failure

A practical bug workflow is:

  1. capture the smallest failing input
  2. decide whether preprocessing is relevant
  3. add a parse_api test or a reftest
  4. patch the grammar
  5. verify the printed AST or error outcome

That keeps parser changes concrete and reviewable.

Error Surface

This chapter describes the error model PARC exposes today.

Two layers of errors

PARC has two main error surfaces:

  1. direct parser errors from parse
  2. driver errors from driver

The distinction is important because the driver includes external preprocessing.

Direct parser errors

The parse module returns:

#![allow(unused)]
fn main() {
Result<T, parc::parse::ParseError>
}

ParseError includes:

  • line
  • column
  • offset
  • expected

This error means:

  • the parser could not consume the full input
  • the failure happened at the given position
  • one of the listed tokens or grammar expectations would have allowed parsing to continue

Driver errors

The driver module returns:

#![allow(unused)]
fn main() {
Result<parc::driver::Parse, parc::driver::Error>
}

That error enum has two branches:

  • PreprocessorError(io::Error)
  • SyntaxError(SyntaxError)

This split is a real contract boundary:

  • preprocessor failures mean PARC never reached parsing
  • syntax failures mean preprocessing succeeded and PARC failed on the resulting text

SyntaxError

driver::SyntaxError contains:

  • source
  • line
  • column
  • offset
  • expected

It also provides:

  • get_location() to map back to source files and include stack
  • format_expected() for user-facing token formatting

What consumers should key on

For durable control flow, consumers should branch on:

  • error type
  • structured fields such as line, column, and expected

Consumers should not branch on:

  • exact human-readable Display text
  • incidental token ordering inside formatted strings

Practical examples

Fragment parsing

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

match parse::statement("if (x) {", Flavor::StdC11) {
    Ok(_) => {}
    Err(err) => {
        eprintln!("statement parse failed at {}:{}", err.line, err.column);
    }
}
}

File parsing

#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Error};

match parse(&Config::default(), "broken.c") {
    Ok(_) => {}
    Err(Error::PreprocessorError(err)) => {
        eprintln!("preprocessor failure: {}", err);
    }
    Err(Error::SyntaxError(err)) => {
        let (loc, includes) = err.get_location();
        eprintln!("syntax failure in {}:{} ({})", loc.file, loc.line, err.column);
        eprintln!("include depth: {}", includes.len());
    }
}
}

Resilient parsing

parse::translation_unit_resilient provides error recovery. When a declaration fails to parse, it skips to the next synchronization point (; at file scope or } at brace depth zero) and continues parsing.

#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;

let tu = parse::translation_unit_resilient(source, Flavor::GnuC11);
// tu.0 contains all successfully parsed declarations
// unparseable regions are silently skipped
}

Use this when you want partial results from files that contain unsupported syntax. The strict translation_unit function is still preferred when you need to detect all errors.

Failure-model guidance

Downstream tools should treat parse failures as normal, reportable outcomes.

That means:

  • do not crash just because one translation unit fails
  • surface the structured error data to the caller
  • retain the preprocessed source when debugging hard failures

Explicit limitations of the current error model

The current model does not provide:

  • semantic diagnostics
  • fix-it suggestions
  • a typed taxonomy for every grammar category of failure
  • warning channels separate from parse success

PARC’s errors are syntax-oriented rather than compiler-like.

Flavor And Extension Support

PARC supports three language flavors and several extension families.

This chapter records what that means in practice.

Flavors

FlavorIntent
StdC11strict C11 parsing
GnuC11C11 plus GNU-oriented syntax
ClangC11C11 plus Clang-oriented syntax

Use the flavor that matches the syntax you expect in the input.

Why flavor matters

Some C parses are ambiguous or extension-specific.

Examples include:

  • GNU statement expressions
  • typeof
  • GCC-style attributes
  • GNU asm statements
  • Clang availability attributes

If you parse extension-heavy source in StdC11, errors are expected.

GNU-oriented support

The AST and parser explicitly model GNU-oriented syntax such as:

  • typeof
  • statement expressions
  • GNU asm statements
  • asm labels
  • attributes
  • designated range initializers

In practice, if the source is GCC-flavored or Linux-kernel-like, GnuC11 is usually the right starting point.

Clang-oriented support

PARC also models Clang-specific or Clang-common syntax including:

  • Clang availability attributes
  • the ClangC11 flavor path in driver and parse

If your preprocessing and syntax assumptions are built around Clang, use Config::with_clang() or Flavor::ClangC11.

C23 keyword support

PARC accepts the following C23 keywords in all flavors, because modern compilers (GCC 15+) emit them in preprocessed output by default:

C23 keywordC11 equivalentNotes
bool_Booltype specifier
true1parsed as integer constant
false0parsed as integer constant
nullptr0parsed as integer constant
static_assert_Static_assertdeclaration
alignas_Alignasalignment specifier
alignof_Alignofexpression
thread_local_Thread_localstorage class
constexpr(none)storage class specifier
typeof__typeof__type specifier (was GNU-only)
_BitInt(N)(none)type specifier with width
noreturn_Noreturnfunction specifier
complex_Complextype specifier

GCC extension types

PARC recognizes these GCC extension types in GNU mode:

TypeAST variantNotes
__int128TypeSpecifier::Int128non-unique, combinable with signed/unsigned
__float128TypeSpecifier::Float128unique type specifier
__builtin_va_listtypedefhandled as built-in typedef name

Standard-mode guidance

Use StdC11 when:

  • you want to reject vendor syntax deliberately
  • your test corpus is intended to stay close to the standard
  • you want parser behavior that is easier to reason about across compilers

Practical consumer policy

A useful integration policy is:

  1. default to the compiler family you actually preprocess with
  2. add tests for the specific extension families you rely on
  3. treat unsupported extensions as explicit parser limitations, not random bugs

What this chapter does not claim

This chapter does not claim exhaustive support for every extension accepted by GCC or Clang.

It does claim that PARC has explicit support for several important extension families and that the flavor setting is part of the API contract for using them correctly.

Unsupported Cases

This chapter records the important unsupported or intentionally out-of-scope areas.

The goal is to prevent downstream users from mistaking absence of detail for implicit support.

It also acts as the current frontend-family closure ledger. Every hard family should fit into one of these buckets:

  • fully supported
  • resilient-only support
  • diagnostics-only improvement
  • intentional rejection

For the Level 1 production claim, this ledger is part of the real contract. If a family is not classified here, it should not be treated as in-scope production behavior.

Frontend-Family Closure Ledger

The current important families are:

FamilyCurrent stateNotes
K&R function declarationsdiagnostics-only improvementPARC preserves the function surface and emits explicit unsupported diagnostics.
block pointersintentional rejectionThey still fail in parsing; current work is about sharper diagnostics, not pretending they lower cleanly.
bitfield-heavy recordsresilient-only supportPARC keeps record shape and bit widths, but layout truth remains partial.
vendor attributes and calling-convention attributesresilient-only supportPARC preserves the declaration and emits partial diagnostics when attributes are ignored.
macro-heavy include stacksfully supported on current canonical corporaThe canonical corpora are the proof surface; more corpora still need to land before claiming broad closure.
hostile include-order and typedef-chain environmentsfully supported on current canonical corporaTreat this as corpus-backed support, not universal extension parity.

This ledger is intentionally blunt:

  • if a family is not yet honestly representable, reject it
  • if a family is only partially representable, say so
  • if a family is only proven on named corpora, document that exact scope

The Level 1 production envelope is Linux/ELF-first and corpus-backed. That means “supported” here should be read as one of:

  • fully supported within the named canonical corpus
  • partially supported with explicit diagnostics
  • rejected explicitly as out of scope

Semantic analysis

PARC does not provide:

  • full name resolution
  • type checking
  • constant folding as a stable analysis contract
  • ABI or layout proof
  • compiler-quality warnings

It is a parser with source-structure support, not a complete compiler frontend.

Preprocessing

PARC does not implement a standalone C preprocessor in the driver path.

Instead it depends on an external preprocessor command such as:

  • gcc -E
  • clang -E

That means PARC does not try to normalize every compiler’s preprocessing behavior internally.

The built-in preprocessor is increasingly useful for scan-first workflows, but it is still a scoped compatibility surface rather than a promise of universal host-header parity.

Extension completeness

PARC supports several GNU and Clang extensions, but the project does not promise complete parity with every extension accepted by modern GCC or Clang releases.

Downstream tools should not assume:

  • full GNU extension completeness
  • full Clang extension completeness
  • identical acceptance behavior across all compiler-version-specific syntax edges

Macro inventory and expansion modeling

PARC parses the post-preprocessing result. It does not expose a first-class macro inventory or a stable semantic model of macro definitions as its own output contract.

If you need macro capture as data, that is outside PARC’s current scope.

Translation-unit semantics

PARC can parse translation units, but it does not guarantee:

  • cross-file symbol resolution
  • duplicate-definition analysis as a stable feature
  • semantic correctness of declarations
  • linkability of parsed declarations

Those tasks belong to later analysis layers, not the parser itself.

Diagnostics depth

PARC does not currently provide:

  • warning classes
  • fix-it suggestions
  • rich categorized error codes
  • a stable diagnostic JSON schema

The current error model is strong enough for syntax handling, not full compiler UX.

The practical rule for the remaining hard families is:

  • if PARC can keep a trustworthy declaration surface, it should do so and emit diagnostics
  • if PARC cannot keep a trustworthy declaration surface, it should reject the construct explicitly

Consumer guidance

Downstream tools should treat these gaps as explicit non-guarantees.

That means:

  • build policy around syntax success and failure, not semantic certainty
  • isolate extension-heavy assumptions behind tests
  • keep representative preprocessed fixtures for any hard parser dependency
  • treat the closure ledger above as part of the real contract, not as a vague future roadmap

Reproducibility

Parsing C is sensitive to the exact preprocessor environment.

This chapter documents how to keep PARC-based workflows reproducible.

Main reproducibility risks

The biggest sources of drift are:

  • different preprocessor executables
  • different default include paths
  • different predefined macros
  • different parser flavor settings
  • different preprocessed snapshots in tests

Best practices

For durable automation:

  1. prefer explicit Config values over ambient defaults in CI
  2. pin include paths with -I... when they matter
  3. use -nostdinc for isolated fixture testing when appropriate
  4. keep preprocessed snapshots for hard parser regressions
  5. keep the parser flavor explicit in tests

Deterministic parse debugging

If a real file parse is inconsistent across machines, a strong debugging move is:

  1. capture the preprocessed output
  2. switch the failing test to parse_preprocessed
  3. debug PARC against the stable snapshot

That separates:

  • preprocessing differences
  • parser differences

Reftests and snapshots

The reftest harness already encourages deterministic expectations by comparing against printed AST output. For parser bugs that depend on preprocessing, a pinned .i file is often even better.

Consumer guidance

If PARC is part of a larger pipeline, keep the following recorded somewhere durable:

  • preprocessor executable
  • preprocessor arguments
  • flavor
  • representative fixtures
  • expected parse outcome

Without that context, debugging parser regressions is much slower.

Stable Usage Patterns

This chapter records usage patterns that are safest for downstream consumers.

Pattern 1: Separate parsing from analysis

A durable integration pattern is:

  1. parse with PARC
  2. convert the AST into your own analysis model if needed
  3. run later semantic or policy logic on that model

This avoids coupling too much of your tool to every detail of PARC’s raw AST layout.

Pattern 2: Preserve preprocessed source for diagnostics

If you use driver, keep Parse::source around as long as you may need diagnostics.

That enables:

  • mapping spans back to files and lines
  • debugging parser failures later
  • reproducing failures from stored snapshots

Pattern 3: Make flavor explicit

Even when defaults are convenient, explicit flavor choices are easier to maintain in tools and tests.

Prefer:

  • Flavor::StdC11 for strict grammar tests
  • Flavor::GnuC11 when GNU syntax is intentional
  • Flavor::ClangC11 when Clang-specific syntax is intentional

Pattern 4: Test the syntax you depend on

If your downstream tool depends on a specific syntax family, keep representative tests for it.

Examples:

  • function-pointer declarators
  • designated initializers
  • GNU statement expressions
  • inline asm
  • availability attributes

Pattern 5: Treat parse failure as data

A mature integration does not assume every input will parse. It treats parse failure as a structured, reportable outcome.

That means:

  • returning parse diagnostics to the caller
  • logging the failing source context when appropriate
  • keeping failure fixtures in the test corpus

Pattern 6: Prefer local traversal hooks

When building analyzers, override the narrowest useful visitor hook instead of one huge catch-all traversal method.

That makes the analysis easier to maintain as the AST evolves.

Contributor Workflow

This chapter records a practical workflow for changing parc safely.

Smallest-reproducer rule

When fixing or extending the parser, start with the smallest input that demonstrates the issue.

That input should usually be one of:

  • a direct parse::* snippet
  • a reftest file
  • a preprocessed snapshot

This keeps parser work focused.

  1. reproduce the issue with the smallest possible input
  2. decide whether the right test layer is parse_api, reftest, or full-app style
  3. inspect the AST or failure position with Printer or structured errors
  4. patch the relevant parser module
  5. rerun the focused tests
  6. only then widen out to broader test coverage

Choosing the right test layer

Use parse_api tests when:

  • the bug is a simple grammar acceptance issue
  • you only need a success/failure assertion

Use reftests when:

  • tree shape matters
  • printer output is the clearest regression oracle

Use preprocessed or full-app style fixtures when:

  • includes or macro expansion are part of the problem
  • driver behavior matters

Grammar-oriented debugging

A good parser debugging loop is:

  1. isolate the failing syntax
  2. parse with the right flavor
  3. inspect the closest AST shape that already works
  4. patch the grammar in the most local parser file possible

This is usually better than broad speculative rewrites.

AST changes

If you add or change an AST node, review the corresponding surfaces too:

  • visitor hooks in visit
  • printer behavior in print
  • any book examples that describe the shape
  • reftest expectations if printer output changed

Documentation changes

If a syntax family becomes better supported, update the book at the same time. The important places are usually:

  • flavor/extension guidance
  • unsupported cases
  • workflows
  • AST or visitor examples

That keeps the book aligned with the real parser contract.

Boundary rule

When changing parc, keep the ownership split explicit:

  • parc owns preprocessing, parsing, extraction, and source artifacts
  • parc does not own link evidence or Rust lowering
  • do not document parser internals as if they were a shared ABI for the rest of the pipeline

If a change makes the source artifact richer, document the richer source meaning directly instead of hinting that downstream crates depend on parc library internals.

Maintenance rule

The maintenance bar is simple:

  1. add or tighten the smallest useful test first
  2. keep public contract docs and examples in the same patch
  3. prefer deleting stale workflow language over preserving it for history
  4. do not keep dead compatibility stories in the book

Support Tiers

This chapter records a practical support posture for PARC’s public surface.

It is meant to help downstream users judge which parts of the crate are the safest long-term integration points.

Tier 1: Core Consumer Surface

These are the most important public surfaces to depend on:

  • driver
  • parse
  • ast
  • visit
  • span
  • loc

These modules define the main parsing contract of the crate.

Tier 2: Debugging And Inspection Surface

These are public and useful, but more inspection-oriented than contract-critical:

  • print
  • Debug views of AST nodes
  • formatted error text

They are valuable for debugging and tests, but long-lived tooling should still prefer structured data over formatted strings.

Tier 3: Contributor-Oriented Knowledge

These are important for contributors but should not be treated as downstream contracts:

  • parser file organization under src/parser/
  • helper-module layout
  • incidental internal naming
  • current implementation decomposition across grammar files

These details may evolve as the parser changes.

Consumer guidance

If you are building external tooling on top of PARC, bias toward Tier 1 surfaces first. Reach for Tier 2 when you need diagnostics or debugging support. Treat Tier 3 as implementation detail unless you are actively contributing to PARC itself.

Hardening Matrix

This chapter translates the large PARC test surface into an explicit hardening ladder.

The important point is not “how many tests exist”. The important point is which surfaces are carrying confidence for real-header parsing, preprocessing, and source extraction.

How To Read The Matrix

Read each surface on three axes:

  • hermetic or host-dependent
  • parser-only versus scan-first
  • success path versus conservative failure path

A surface is stronger when it is:

  • hermetic
  • scan-first
  • repeated deterministically
  • tied to a realistic system or library family

Tier 1: Hermetic Canonical Baselines

These are the first surfaces that should stay green on every machine:

  • vendored musl stdint
  • vendored zlib
  • vendored libpng builtin-preprocessor success path
  • repo-owned macro_env_a hostile macro corpus
  • repo-owned type_env_b hostile type corpus
  • parser and extraction corpus fixtures under src/tests/**

These matter because they exercise:

  • multi-header scanning
  • macro and include handling
  • extraction into SourcePackage
  • deterministic behavior without relying on the host toolchain layout

Tier 2: Host-Dependent Canonical Ladders

These should stay green on developer and CI hosts where the headers exist, but they are not the first portability baseline:

  • OpenSSL public wrapper extraction
  • combined Linux event-loop wrapper extraction
  • larger libc and system-header clusters

These surfaces matter because they are closer to the “real ugly header world” target than the small synthetic fixtures.

Tier 3: Hostile And Conservative-Failure Surfaces

These prove that PARC is refusing or degrading honestly instead of pretending to understand everything:

  • hostile declaration fixtures
  • repo-owned hostile corpora that force builtin-preprocessor macro and typedef expansion
  • recovery fixtures
  • unsupported or partial declaration families that still emit diagnostics and partial metadata
  • extraction-status summaries that distinguish supported, partial, and unsupported output trust

For release purposes, these failures are good when they are:

  • deterministic
  • diagnostic
  • documented

Determinism Anchors

The most important repeat-run anchors right now are:

  • vendored musl scan
  • vendored zlib scan
  • vendored libpng scan
  • macro_env_a scan
  • type_env_b scan
  • OpenSSL wrapper extraction
  • combined Linux event-loop wrapper extraction

If any of those become unstable, the release posture should drop immediately.

What This Matrix Does Not Mean

This matrix does not mean:

  • every random system library now parses perfectly
  • every preprocessor corner is solved
  • every large host-dependent surface is equally mature

It means the current confidence ladder is explicit instead of implied.

Parser Boundaries

This chapter explains where PARC starts and where it intentionally stops.

PARC owns syntax parsing

PARC is responsible for:

  • accepting supported C syntax
  • building an AST
  • carrying spans
  • mapping parse positions back through preprocessor line markers

That is the core boundary of the crate.

PARC does not own full compilation

PARC does not attempt to be:

  • a full preprocessor implementation
  • a type checker
  • a linker-aware analyzer
  • a code generator
  • a full semantic compiler frontend

These are not accidental omissions. They are part of the intended scope boundary.

Practical layering

A healthy toolchain boundary looks like this:

  1. a compiler or preprocessor produces acceptable input
  2. PARC parses it
  3. a later layer performs semantic analysis, policy checks, or code generation

This keeps PARC focused on syntax and source structure.

Why this matters for consumers

If a downstream tool needs:

  • ABI guarantees
  • linker truth
  • semantic type equivalence
  • macro inventories as data

then PARC should be one component in the pipeline, not the whole pipeline.

Why this matters for contributors

When deciding whether a new feature belongs in PARC, a useful question is:

“Does this improve PARC’s syntax parsing and source-structure contract, or does it drag PARC into a later compiler stage?”

If it is mostly a later-stage concern, it probably belongs outside PARC.

Release Checklist

This chapter is a pragmatic checklist for documentation and parser changes before a release.

The important release posture is architectural:

  • parc releases source/frontend behavior
  • it does not release binary or Rust-generation policy
  • the tested SourcePackage contract matters more than parser-internal churn

Parser changes

Before releasing parser changes:

  1. confirm the smallest reproducer has a test
  2. confirm the intended flavor coverage is tested
  3. confirm the AST shape change is deliberate
  4. confirm visitor and printer behavior still make sense

Book changes

Before releasing documentation changes:

  1. confirm the affected public behavior is described in the book
  2. confirm unsupported or out-of-scope cases are still documented honestly
  3. confirm examples still match the actual public API names

Error-surface changes

Before releasing changes around errors:

  1. confirm structured fields still provide the needed information
  2. avoid treating formatted strings as the real contract
  3. update the error-surface chapter if the practical behavior changed

Workflow changes

Before releasing changes to the normal integration path:

  1. update the workflow chapter
  2. update the API contract chapter if the preferred boundary changed
  3. update stable-usage guidance if downstream posture should change

Artifact contract changes

Before releasing a SourcePackage shape change:

  1. confirm the changed field meaning is covered by contract-level tests
  2. confirm the consuming workflow examples still describe artifact boundaries
  3. confirm cross-crate composition is still described as tests/examples/harness work, not library coupling

Release gate

parc is ready to release only when:

  • make build passes
  • make test passes
  • the canonical hardening surfaces are still green
    • vendored musl stdint
    • vendored zlib
    • vendored libpng scan
    • OpenSSL public wrapper extraction
    • libcurl public wrapper extraction
    • combined Linux event-loop wrapper extraction
  • deterministic repeated extraction still holds on the canonical large surfaces
  • the book still teaches parc as the source-meaning crate
  • unsupported or partial source behavior is still documented honestly

Final practical rule

If a change would force a downstream PARC consumer to rethink how it parses, traverses, or reports on source, the book should say so explicitly in the same change.

Readiness Scorecard

This chapter ties PARC readiness to real suites instead of vague confidence claims.

Overall Posture

PARC should currently be read as:

  • strong on parser and extraction fundamentals
  • strong on scan-first vendored baselines
  • materially stronger on hostile real-world builtin-preprocessor corners
  • intentionally conservative when a large header family cannot be modeled honestly

For Level 1 production, PARC should be read as Linux/ELF-first and canonical-corpus-backed, not as a universal frontend for arbitrary C headers.

For whole-pipeline claims, this score is also capped by downstream gerc anchors that ingest translated PARC source surfaces in tests/examples.

That is good progress, but it is not the same thing as “finished for every C header in the wild”.

Subsystem Scorecard

  • parser entrypoints: high
  • AST traversal and printing: high
  • extraction to SourcePackage: high
  • scan-first vendored baselines: high
  • hostile-header recovery: medium-high
  • built-in preprocessor coverage on ugly system headers: medium-high
  • large host-dependent wrapper extraction: medium-high
  • deterministic behavior on canonical large surfaces: high

Canonical Readiness Anchors

The release posture should be judged against these anchors first:

  • vendored musl stdint
  • vendored zlib
  • vendored libpng scan
  • repo-owned macro_env_a
  • repo-owned type_env_b
  • OpenSSL public wrapper extraction
  • combined Linux event-loop wrapper extraction

If those anchors stay green and deterministic, PARC is earning trust. If they drift, the scorecard should be lowered even if many smaller tests still pass.

What Would Raise Readiness Further

The next meaningful gains would be:

  • broader built-in-preprocessor coverage on other hostile width and platform gates beyond the libpng family
  • more ugly combined system-header clusters
  • more repeat-run deterministic scans on large host-dependent surfaces
  • clearer unsupported-case diagnostics for the remaining difficult families

Migration From bic

This chapter documents how to migrate downstream consumers from bic’s frontend extraction to parc’s SourcePackage contract.

Why migrate

parc now owns source-level declaration extraction. bic’s extract.rs was the legacy location for this logic. The canonical path is now:

C headers  ->  parc::scan / parc::extract  ->  SourcePackage  ->  downstream

bic should consume parc::ir::SourcePackage instead of owning its own extraction.

Type mapping

bic typeparc typeNotes
BindingPackageSourcePackageparc has no layouts, link, or bic_version
BindingItemSourceItemSame variant set
BindingTypeSourceTypePointer model differs (see below)
FunctionBindingSourceFunctionIdentical structure
ParameterBindingSourceParameterIdentical structure
RecordBindingSourceRecordNo representation or abi_confidence
FieldBindingSourceFieldNo layout field
EnumBindingSourceEnumIdentical structure
TypeAliasBindingSourceTypeAliasNo canonical_resolution
VariableBindingSourceVariableIdentical structure
UnsupportedItemSourceUnsupportedIdentical structure
CallingConventionCallingConventionparc version includes Unknown(String)
TypeQualifiersTypeQualifiersIdentical structure
BindingTargetSourceTargetIdentical structure
BindingInputsSourceInputsIdentical structure
BindingDefineSourceDefineIdentical structure
MacroBindingSourceMacroparc drops function_like and category
DeclarationProvenanceDeclarationProvenanceIdentical structure
MacroProvenanceMacroProvenanceIdentical structure

Pointer model difference

bic:

#![allow(unused)]
fn main() {
Pointer {
    pointee: Box<BindingType>,
    const_pointee: bool,      // whether pointee is const
    qualifiers: TypeQualifiers, // qualifiers on the pointer itself
}
}

parc:

#![allow(unused)]
fn main() {
Pointer {
    pointee: Box<SourceType>,
    qualifiers: TypeQualifiers, // is_const means pointee is const
}
}

In parc, qualifiers.is_const on a Pointer indicates that the pointee is const-qualified. Use SourceType::const_ptr(inner) and SourceType::ptr(inner) as constructors.

Missing fields in parc

These bic fields are intentionally absent from parc because they belong to the link/ABI layer:

  • FieldBinding.layout (field offset) — use LINC probing
  • RecordBinding.representation — use LINC probing
  • RecordBinding.abi_confidence — use LINC validation
  • TypeAliasBinding.canonical_resolution — parc preserves TypedefRef chains
  • BindingPackage.layouts — use LINC probing
  • BindingPackage.link — use LINC link surface
  • BindingPackage.effective_macro_environment — use LINC macro analysis

Migration steps

Step 1: Replace extraction call

Before:

#![allow(unused)]
fn main() {
use bic::extract::Extractor;
use bic::ir::BindingPackage;

let extractor = Extractor::new();
let (items, diagnostics) = extractor.extract(&unit);
let mut pkg = BindingPackage::new();
pkg.items = items;
}

After:

#![allow(unused)]
fn main() {
use parc::extract;
use parc::ir::SourcePackage;

let pkg = extract::extract_from_translation_unit(&unit, Some("header.h".into()));
}

Or for end-to-end scanning:

#![allow(unused)]
fn main() {
use parc::scan::{ScanConfig, scan_headers};

let config = ScanConfig::new()
    .entry_header("header.h")
    .with_builtin_preprocessor();
let result = scan_headers(&config).unwrap();
let pkg: &SourcePackage = &result.package;
}

Step 2: Update type references

Replace all uses of BindingType with SourceType, BindingItem with SourceItem, etc. The variant names are identical.

Step 3: Handle pointer model

Replace const_pointee checks:

#![allow(unused)]
fn main() {
// Before (bic)
if let BindingType::Pointer { const_pointee: true, .. } = ty { ... }

// After (parc)
if let SourceType::Pointer { qualifiers, .. } = ty {
    if qualifiers.is_const { ... }
}
}

Step 4: Remove ABI fields

Any code that reads FieldBinding.layout, RecordBinding.representation, or RecordBinding.abi_confidence should be moved to LINC’s domain.

Step 5: Use builder for programmatic construction

#![allow(unused)]
fn main() {
use parc::ir::{SourcePackageBuilder, SourceItem, SourceFunction, ...};

let pkg = SourcePackageBuilder::new()
    .source_path("api.h")
    .item(SourceItem::Function(func))
    .item(SourceItem::Record(rec))
    .build();
}

API reference

Key public APIs for downstream consumers:

  • parc::extract::extract_from_source(src) — parse and extract
  • parc::extract::extract_from_translation_unit(unit, path) — extract from AST
  • parc::extract::parse_and_extract(src, flavor) — with flavor control
  • parc::extract::parse_and_extract_resilient(src, flavor) — with error recovery
  • parc::scan::scan_headers(config) — end-to-end header scanning
  • parc::ir::SourcePackage — the contract type
  • parc::ir::SourcePackageBuilder — programmatic construction
  • SourcePackage::retain_items(pred) — filter items
  • SourcePackage::merge(other) — combine packages