PARC Reference
PARC is the source frontend of the toolchain. The real crate surface today is:
- preprocessing through both external-driver and built-in paths
- C parsing into a typed AST
- extraction into a durable source IR
- header scanning that goes straight to
SourcePackage - AST-oriented support APIs such as visiting, spans, locations, and printing
That means the crate serves two audiences at once:
- downstream tools that want
parc::ir::SourcePackage - parser-facing tools that want direct AST access
What PARC Owns
- preprocessing
- parsing
- parser recovery
- source extraction
- source diagnostics and provenance
- source IR
- header scanning
- AST traversal and debug support
What PARC Does Not Own
- symbol inventories
- binary validation
- link-plan construction
- Rust lowering or crate emission
Actual Data Flow
raw source / headers
-> driver or built-in preprocessor
-> parser AST
-> extraction
-> SourcePackage
-> serialized source artifact or downstream harness
scan short-circuits that flow into one high-level operation. parse and
driver expose earlier stages for syntax-level consumers.
Module Layout
| Module | What it is actually for |
|---|---|
driver | file-oriented parse flow using an external preprocessor |
preprocess | built-in preprocessing, tokenization, include resolution |
parse | fragment parsing and direct translation-unit parsing from strings |
scan | end-to-end header scanning into SourcePackage |
extract | AST-to-IR lowering and normalization |
ir | durable PARC-owned source contract |
ast | syntax tree for parser-facing consumers |
visit | traversal hooks over the AST |
span / loc | source-position helpers |
print | debug-oriented AST printer |
intake | already-preprocessed source intake helpers |
Boundary
The strongest consumer boundary is parc::ir::SourcePackage.
That is the point where PARC stops owning the problem. Anything involving binary evidence or Rust generation is downstream from PARC, even if tests and harnesses compose those crates together elsewhere.
Reading Strategy
Read the book in one of these orders:
- source-contract path:
Getting Started -> Source IR -> Extraction -> Header Scanning -> API Contract - parser-facing path:
Getting Started -> Driver API -> Parser API -> AST Model -> Visitor Pattern - contributor/debug path:
Project Layout -> Testing -> Diagnostics And Printing -> Parser Boundaries
Getting Started
This chapter is the shortest path from real source or headers to something that
PARC actually produces today: either a parsed AST or a SourcePackage.
Read parc as the source frontend of the toolchain:
parcowns preprocessing, parsing, extraction, and source diagnosticslincowns link and binary evidencegercowns Rust lowering and emitted build output
The boundary rule is strict: parc/src/** must not depend on linc or gerc,
and any cross-package translation belongs only in tests, examples, or external
harnesses.
Add the crate
[dependencies]
parc = { path = "../parc" }
Pick the right API first
Use parc::driver when you have a file on disk and want PARC to run a system
preprocessor first.
use parc::driver::{parse, Config};
fn main() -> Result<(), parc::driver::Error> {
let config = Config::default();
let parsed = parse(&config, "src/tests/files/minimal.c")?;
println!("preprocessed bytes: {}", parsed.source.len());
println!("top-level items: {}", parsed.unit.0.len());
Ok(())
}
Use parc::parse when you already have source text in memory and want to parse
a fragment directly.
use parc::driver::Flavor;
use parc::parse;
fn main() {
let expr = parse::expression("a + b * 2", Flavor::StdC11).unwrap();
println!("{:#?}", expr);
}
Choose a language flavor
PARC supports three parser modes:
| Flavor | Meaning |
|---|---|
StdC11 | Strict C11 |
GnuC11 | C11 plus GNU syntax such as typeof, attributes, statement expressions, and GNU asm |
ClangC11 | C11 plus Clang-oriented extensions such as availability attributes |
For file-based parsing, Config::default() selects:
clang -Eon macOSgcc -Eon other targets
You can also select explicitly:
#![allow(unused)]
fn main() {
use parc::driver::Config;
let gnu = Config::with_gcc();
let clang = Config::with_clang();
}
First useful parse example
This example parses a translation unit through the normal driver path:
use parc::driver::{parse, Config};
fn main() -> Result<(), parc::driver::Error> {
let parsed = parse(&Config::default(), "src/tests/files/minimal.c")?;
for (i, item) in parsed.unit.0.iter().enumerate() {
println!("item #{i}: {:?}", item.node);
}
Ok(())
}
First useful scan example
If what you really want is source IR rather than a raw AST, start with
parc::scan:
#![allow(unused)]
fn main() {
use parc::scan::{scan_headers, ScanConfig};
let config = ScanConfig::new().entry_header("demo.h");
let result = scan_headers(&config).unwrap();
println!("items: {}", result.package.items.len());
}
This is the closest thing PARC has to a “frontend product” API.
First fragment example
If you only need one declaration or statement, the direct parser API is faster to wire in:
use parc::driver::Flavor;
use parc::parse;
fn main() {
let decl = parse::declaration("static const int answer = 42;", Flavor::StdC11).unwrap();
let stmt = parse::statement("return answer;", Flavor::StdC11).unwrap();
println!("{:#?}", decl);
println!("{:#?}", stmt);
}
What to read next
- Common Workflows for choosing between
scan,driver,parse_preprocessed, andparse - Driver API for preprocessing and file-based parsing
- Header Scanning for source-contract-first workflows
- Parser API for fragment parsing
Architectural boundary
parc is the source frontend.
It owns:
- preprocessing
- parsing
- source extraction
- source diagnostics
- the
parc::ir::SourcePackageartifact
It does not own:
- symbol inventory
- binary validation
- link planning
- Rust code generation
In this repository, cross-package composition should not live in parc library
code. linc and gerc should consume parc output only from tests, examples,
or external harnesses.
Common Workflows
Most confusion with PARC comes from choosing the wrong entry point. This chapter maps common tasks to the right API.
Read the workflows in this order:
- prefer source/frontend workflows that stay inside
parc - serialize
SourcePackagewhen another tool needs the result - keep any cross-package translation in tests, examples, or external harnesses
Workflow selection
| Situation | API |
|---|---|
Turn headers into SourcePackage | scan::scan_headers |
Parse a .c or .h file with includes and macros | driver::parse |
| Parse already-preprocessed text from memory | driver::parse_preprocessed |
| Parse one expression, declaration, statement, or translation unit string | parse::* |
| Walk an AST you already parsed | visit |
| Print an AST for debugging | print::Printer |
Scan headers into source IR
Use this when your real target is the PARC source contract rather than the raw syntax tree.
#![allow(unused)]
fn main() {
use parc::scan::{scan_headers, ScanConfig};
let result = scan_headers(&ScanConfig::new().entry_header("demo.h")).unwrap();
println!("diagnostics: {}", result.package.diagnostics.len());
}
This is the best fit for downstream toolchains that want declarations, provenance, macros, and diagnostics in one package.
Parse a real file
Use this when your source depends on #include, #define, or compiler predefined macros.
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
let config = Config::default();
let parsed = parse(&config, "src/main.c")?;
}
This gives you:
parsed.source: the preprocessed source textparsed.unit: the AST root
Parse preprocessed text
Use this when another tool already ran preprocessing and you only want PARC to parse.
#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};
let config = Config::default();
let source = r#"
1 "generated.i"
typedef int count_t;
count_t answer(void) { return 42; }
"#
.to_string();
let parsed = parse_preprocessed(&config, source)?;
}
This is useful for:
- snapshot-based tests
- integration with custom build systems
- reproducing parse bugs from stored
.ifiles
Parse a fragment
Use parc::parse when you are not dealing with a whole file.
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let expr = parse::expression("ptr->len + 1", Flavor::GnuC11)?;
let decl = parse::declaration("unsigned long flags;", Flavor::StdC11)?;
let stmt = parse::statement("if (ok) return 1;", Flavor::StdC11)?;
}
This is the right choice for:
- unit tests
- parser experiments
- editor tooling for partial snippets
Build an analyzer
The normal analyzer flow is:
- Parse with
driverorparse - Traverse with
visit - Use
spanandlocfor diagnostics
Example outline:
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::{self, Visit};
use parc::{ast, span};
struct FunctionCounter {
count: usize,
}
impl<'ast> Visit<'ast> for FunctionCounter {
fn visit_function_definition(
&mut self,
node: &'ast ast::FunctionDefinition,
span: &'ast span::Span,
) {
self.count += 1;
visit::visit_function_definition(self, node, span);
}
}
let parsed = parse(&Config::default(), "src/main.c")?;
let mut counter = FunctionCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);
println!("functions: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}
Debug the parse tree
Use the printer when you need a human-readable structural dump:
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::print::Printer;
use parc::visit::Visit;
let parsed = parse(&Config::default(), "src/main.c")?;
let mut out = String::new();
Printer::new(&mut out).visit_translation_unit(&parsed.unit);
println!("{}", out);
Ok::<(), parc::driver::Error>(())
}
Rule of thumb
- If you want
SourcePackage, start withscan. - If preprocessing matters and you still want the AST, start with
driver. - If you already have plain text in memory, start with
parse. - If you need diagnostics tied back to original files, keep the preprocessed source string.
- If another crate needs PARC output, stop at
SourcePackageand translate it outsideparc/src/**.
Driver API
The driver module is the high-level API for file parsing. It runs a system preprocessor, then
parses the resulting text into a TranslationUnit.
Main types
#![allow(unused)]
fn main() {
pub struct Config {
pub cpp_command: String,
pub cpp_options: Vec<String>,
pub flavor: Flavor,
}
pub enum Flavor {
StdC11,
GnuC11,
ClangC11,
}
pub struct Parse {
pub source: String,
pub unit: TranslationUnit,
}
}
The return value matters:
sourceis the preprocessed source PARC actually parsedunitis the AST root
Basic file parsing
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
let config = Config::default();
let parsed = parse(&config, "examples/demo.c")?;
println!("preprocessed bytes: {}", parsed.source.len());
println!("top-level nodes: {}", parsed.unit.0.len());
Ok::<(), parc::driver::Error>(())
}
Configuring the preprocessor
You can override both the preprocessor executable and its arguments.
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Flavor};
let config = Config {
cpp_command: "gcc".into(),
cpp_options: vec![
"-E".into(),
"-Iinclude".into(),
"-DMODE=2".into(),
"-nostdinc".into(),
],
flavor: Flavor::GnuC11,
};
let parsed = parse(&config, "src/input.c")?;
Ok::<(), parc::driver::Error>(())
}
This is the place to inject:
- include directories with
-I... - macro definitions with
-D... - stricter or more isolated builds with
-nostdinc
GCC vs Clang helpers
The convenience constructors also select parser flavor:
#![allow(unused)]
fn main() {
use parc::driver::Config;
let gcc = Config::with_gcc(); // gcc -E, GNU flavor
let clang = Config::with_clang(); // clang -E, Clang flavor
}
Use these when you want the parser flavor to match the syntax accepted by the external preprocessor.
Parsing preprocessed text directly
If you already have .i-style content, skip parse and call parse_preprocessed.
#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};
let source = r#"
1 "sample.i"
typedef int count_t;
count_t next(count_t x) { return x + 1; }
"#
.to_string();
let parsed = parse_preprocessed(&Config::default(), source)?;
println!("{}", parsed.unit.0.len());
Ok::<(), parc::driver::SyntaxError>(())
}
Error model
driver::parse returns:
#![allow(unused)]
fn main() {
Result<Parse, parc::driver::Error>
}
The error variants are:
PreprocessorError(io::Error)when the external preprocessor failsSyntaxError(SyntaxError)when preprocessing succeeded but parsing failed
Working with syntax errors
SyntaxError includes:
source: the preprocessed sourceline,column,offset: the parse failure position in that sourceexpected: a set of expected tokens
Example:
#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};
let broken = "int main( { return 0; }".to_string();
match parse_preprocessed(&Config::default(), broken) {
Ok(_) => {}
Err(err) => {
eprintln!("parse failed at {}:{}", err.line, err.column);
eprintln!("expected: {:?}", err.expected);
}
}
}
If the preprocessed source contains line markers, SyntaxError::get_location() can reconstruct the
original file and include stack.
Built-in preprocessor
PARC includes a built-in C preprocessor that eliminates the need for an external
gcc or clang binary. Use parse_builtin instead of parse:
#![allow(unused)]
fn main() {
use parc::driver::{parse_builtin, Config};
use std::path::Path;
let config = Config::with_gcc();
let include_paths = vec![Path::new("/usr/include")];
let parsed = parse_builtin(&config, "src/input.c", &include_paths)?;
Ok::<(), parc::driver::Error>(())
}
The built-in preprocessor supports:
- Object-like and function-like macros (with
#,##,__VA_ARGS__) - Conditional compilation (
#if,#ifdef,#ifndef,#elif,#else,#endif) #includeresolution with configurable search paths- Include guard detection and optimization
defined()operator in#ifexpressions- Full C constant expression evaluation (arithmetic, bitwise, logical, ternary)
- Predefined target macros (architecture, OS, GCC compatibility)
Macro extraction
To extract all #define macros from a C file (equivalent to gcc -dD -E):
#![allow(unused)]
fn main() {
use parc::driver::capture_macros;
use std::path::Path;
let macros = capture_macros("src/input.c", &[Path::new("/usr/include")])?;
for (name, value) in ¯os {
println!("#define {} {}", name, value);
}
Ok::<(), parc::driver::Error>(())
}
This returns all macros active after preprocessing, including predefined target macros and macros from included headers.
Practical advice
- Keep
parsed.sourceif you plan to report errors later. - Use
parse_preprocessedfor deterministic regression tests. - Prefer explicit
cpp_optionsin tools and CI so parse behavior stays reproducible. - Use
parse_builtinwhen you need zero-dependency parsing without a C toolchain.
Built-in Preprocessor
PARC includes a complete built-in C preprocessor in the parc::preprocess module.
This eliminates the runtime dependency on gcc or clang for preprocessing.
Architecture
The preprocessor is split into focused modules:
| Module | Purpose |
|---|---|
token | Token types (Ident, Number, Punct, etc.) |
lexer | Preprocessor tokenizer (§6.4 preprocessing tokens) |
directive | Directive parser (#define, #if, #include, etc.) |
macros | Macro table, object-like and function-like expansion |
expr | #if constant expression evaluator |
processor | Conditional compilation engine |
include | #include resolution with search paths and guard tracking |
predefined | Target-specific predefined macros |
Quick start
#![allow(unused)]
fn main() {
use parc::preprocess::preprocess;
let output = preprocess("#define X 42\nint a = X;\n");
// output.tokens contains the expanded token stream
}
Macro expansion
Both object-like and function-like macros are supported:
#define SIZE 1024
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define LOG(fmt, ...) printf(fmt, __VA_ARGS__)
Features:
#stringification operator##token pasting operator__VA_ARGS__for variadic macros- Recursive expansion with “paint set” to prevent infinite recursion (C standard §6.10.3.4)
- Self-referential macros handled correctly (
#define X X + 1expands toX + 1)
Conditional compilation
All standard conditional directives are supported:
#if CONDITION
#ifdef NAME
#ifndef NAME
#elif CONDITION
#else
#endif
The #if expression evaluator supports:
- Integer literals (decimal, octal, hex, binary)
- Character constants (
'x') defined(NAME)anddefined NAME- All C operators: arithmetic, bitwise, logical, comparison, ternary
- Undefined identifiers evaluate to
0(per C standard §6.10.1p4)
Include resolution
#![allow(unused)]
fn main() {
use parc::preprocess::{IncludeResolver, Processor};
let mut resolver = IncludeResolver::new();
resolver.add_system_path("/usr/include");
resolver.add_local_path("./include");
let mut processor = Processor::new();
let result = resolver.preprocess_file(
std::path::Path::new("src/main.c"),
&mut processor,
);
}
Features:
"local"includes search relative to the including file, then local paths<system>includes search system paths only- Include guard detection (
#ifndef X / #define X / ... / #endif) - File content caching
- Maximum include depth (200) to prevent infinite recursion
Predefined macros
Target-specific macros are available for common platforms:
#![allow(unused)]
fn main() {
use parc::preprocess::{MacroTable, Target, define_target_macros};
let mut table = MacroTable::new();
define_target_macros(&mut table, &Target::host());
// Now table has __STDC__, __linux__, __x86_64__, __GNUC__, etc.
}
Supported targets:
- Architectures: x86_64, aarch64, x86, arm
- Operating systems: Linux, macOS (Darwin), Windows
Standard macros defined:
__STDC__,__STDC_VERSION__,__STDC_HOSTED__- Architecture-specific:
__x86_64__,__aarch64__,__i386__,__arm__ - OS-specific:
__linux__,__APPLE__,_WIN32, etc. - GCC compatibility:
__GNUC__,__GNUC_MINOR__,__GNUC_PATCHLEVEL__ - Type sizes:
__SIZEOF_POINTER__,__SIZEOF_INT__, etc. - Limits:
__CHAR_BIT__,__INT_MAX__,__LONG_MAX__
Source IR
The parc::ir module defines the durable intermediate representation produced
by the PARC frontend. It is the primary contract between the parser/extractor
and downstream consumers (LINC, GERC).
Design Principles
- Smaller than the AST: only normalized declarations, not the full syntax tree
- Serializable: all types derive
serde::Serializeandserde::Deserialize - Parser-agnostic: downstream consumers should depend on
parc::ir, notparc::ast - No link/binary concerns: no ABI probing, no library paths, no symbol validation
Key Types
SourcePackage
The top-level container:
#![allow(unused)]
fn main() {
use parc::ir::SourcePackage;
let pkg = SourcePackage::new();
assert!(pkg.is_empty());
}
SourceType
Represents C types at source level:
Void, Bool, Char, SChar, UChar, Short, UShort,
Int, UInt, Long, ULong, LongLong, ULongLong,
Float, Double, LongDouble, Int128, UInt128,
Pointer, Array, Qualified, FunctionPointer,
TypedefRef, RecordRef, EnumRef, Opaque
SourceItem
One extracted declaration:
Function— function declaration with name, parameters, return type, calling conventionRecord— struct/union with optional fieldsEnum— enum with named variants and optional valuesTypeAlias— typedef declarationVariable— extern variable declarationUnsupported— placeholder for unrepresentable declarations
SourceMacro
Captured preprocessor macro with form (object-like/function-like), kind, and optional parsed value.
SourceDiagnostic
Frontend diagnostic with kind, severity, message, optional location, and optional item name.
Provenance
SourceOrigin— where a declaration came from (Entry, UserInclude, System, Unknown)DeclarationProvenance— per-item provenance metadataMacroProvenance— per-macro provenance metadataSourceTarget— compiler/target identitySourceInputs— entry headers, include dirs, defines
JSON Serialization
All IR types support JSON roundtrip:
#![allow(unused)]
fn main() {
use parc::ir::SourcePackage;
let pkg = SourcePackage::new();
let json = serde_json::to_string_pretty(&pkg).unwrap();
let back: SourcePackage = serde_json::from_str(&json).unwrap();
assert_eq!(pkg, back);
}
Querying
SourcePackage provides typed accessors:
#![allow(unused)]
fn main() {
// pkg.functions() -> Iterator<Item = &SourceFunction>
// pkg.records() -> Iterator<Item = &SourceRecord>
// pkg.enums() -> Iterator<Item = &SourceEnum>
// pkg.type_aliases() -> Iterator<Item = &SourceTypeAlias>
// pkg.variables() -> Iterator<Item = &SourceVariable>
// pkg.unsupported_items() -> ...
// pkg.find_function("malloc")
// pkg.find_record("point")
// pkg.find_enum("color")
// pkg.find_type_alias("size_t")
// pkg.find_variable("errno")
}
Extraction
The parc::extract module converts a parsed C AST into the normalized
SourcePackage IR. It handles all declaration families.
Quick Start
#![allow(unused)]
fn main() {
use parc::extract;
let source = r#"
typedef unsigned long size_t;
void *malloc(size_t size);
struct point { int x; int y; };
"#;
let pkg = extract::extract_from_source(source).unwrap();
assert_eq!(pkg.function_count(), 1);
assert_eq!(pkg.record_count(), 1);
assert_eq!(pkg.type_alias_count(), 1);
}
API Functions
extract_from_source
Parse and extract in one step using GNU C11 flavor:
#![allow(unused)]
fn main() {
let pkg = parc::extract::extract_from_source("int foo(void);").unwrap();
}
parse_and_extract
Parse and extract with a specific flavor:
#![allow(unused)]
fn main() {
let pkg = parc::extract::parse_and_extract(
"int foo(void);",
parc::driver::Flavor::StdC11,
).unwrap();
}
extract_from_translation_unit
Extract from an already-parsed AST:
#![allow(unused)]
fn main() {
let unit = parc::parse::translation_unit("int foo(void);", parc::driver::Flavor::StdC11).unwrap();
let pkg = parc::extract::extract_from_translation_unit(&unit, Some("test.h".into()));
}
parse_and_extract_resilient
Parse with error recovery and extract what’s possible:
#![allow(unused)]
fn main() {
let pkg = parc::extract::parse_and_extract_resilient(
"int valid;\n@@@bad@@@;\nint also_valid;",
parc::driver::Flavor::StdC11,
);
}
extract_file
Read a file from disk and extract:
#![allow(unused)]
fn main() {
let pkg = parc::extract::extract_file("path/to/header.h", parc::driver::Flavor::GnuC11).unwrap();
assert!(pkg.source_path.is_some());
}
What Gets Extracted
| C Declaration | Source Item |
|---|---|
typedef int T; | SourceTypeAlias |
int foo(void); | SourceFunction |
int foo(void) { ... } | SourceFunction (body ignored) |
struct S { int x; }; | SourceRecord |
struct S; | SourceRecord (opaque) |
union U { ... }; | SourceRecord (Union kind) |
enum E { A, B }; | SourceEnum |
extern int x; | SourceVariable |
static int f() {} | Diagnostic (not bindable) |
_Static_assert(...) | Diagnostic |
Diagnostics
The extractor produces diagnostics for constructs it cannot fully represent:
- Bitfield widths (partial representation)
- Inline/noreturn specifiers (ignored)
- Calling convention attributes (captured on function, other attributes warned)
- K&R function declarations (unsupported)
- Block pointers (unsupported)
- Static functions (not bindable)
Header Scanning
parc::scan is the highest-level PARC API for people who want the source
contract, not just the AST. It preprocesses headers, parses them, extracts
items, and returns a SourcePackage plus the preprocessed source text.
Quick Start
#![allow(unused)]
fn main() {
use parc::scan::{ScanConfig, scan_headers};
let config = ScanConfig::new()
.entry_header("api.h")
.include_dir("/usr/include")
.define_flag("NDEBUG")
.with_builtin_preprocessor();
let result = scan_headers(&config).unwrap();
let pkg = result.package;
}
What scan really owns
The scan path currently owns all of these steps:
- choose builtin or external preprocessing
- build the preprocessing environment
- parse the preprocessed translation unit
- extract declarations into
parc::ir - attach input metadata and diagnostics
- optionally resolve typedef chains in the produced package
That makes it the closest thing PARC has to a “source artifact producer”.
ScanConfig
Builder for scan configuration:
| Method | Description |
|---|---|
entry_header(path) | Add an entry-point header |
include_dir(path) | Add a preprocessor include search path |
define(name, value) | Add a preprocessor define with value |
define_flag(name) | Add a flag-style define (no value) |
with_compiler(cmd) | Set the external preprocessor command |
with_flavor(flavor) | Set the parser flavor |
with_builtin_preprocessor() | Use the built-in preprocessor |
Preprocessing Modes
External (default)
Uses gcc -E or clang -E to preprocess headers. Requires the
compiler to be installed. Supports all system headers.
Built-in
Uses parc::preprocess directly. This is useful for controlled fixtures and
repo-local tests. It is not a promise that the built-in preprocessor already
matches every hostile system-header stack.
ScanResult
The scan produces:
package: SourcePackage— the extracted declarations and metadatapreprocessed_source: String— the preprocessed source text
Intake
For already-preprocessed source (e.g., output of gcc -E), use
parc::intake::PreprocessedInput:
#![allow(unused)]
fn main() {
use parc::intake::PreprocessedInput;
let input = PreprocessedInput::from_string("int foo(void);")
.with_path("output.i")
.with_flavor(parc::driver::Flavor::GnuC11);
let pkg = input.extract();
}
What to expect from failures
scan_headers() can fail early on preprocessing setup problems, and it can
also return a package with parse diagnostics if preprocessing succeeded but the
source could not be fully parsed.
That split is intentional:
- operational setup failures are
Err(...) - source-level failures become
package.diagnosticswhen possible
Parser API
The parse module exposes direct parsing functions that work on in-memory strings. Unlike
driver, it does not invoke an external preprocessor.
Available entry points
#![allow(unused)]
fn main() {
parse::constant(source, flavor)
parse::expression(source, flavor)
parse::declaration(source, flavor)
parse::statement(source, flavor)
parse::translation_unit(source, flavor)
}
These map to progressively larger grammar fragments.
Return types
The direct parser returns the same ParseResult<T> shape for every entry point:
#![allow(unused)]
fn main() {
type ParseResult<T> = Result<T, ParseError>;
}
ParseError contains:
linecolumnoffsetexpected
That makes it well suited for parser tests and editor integrations.
Parse an expression
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let expr = parse::expression("value + 1 * scale", Flavor::StdC11)?;
println!("{:#?}", expr);
Ok::<(), parc::parse::ParseError>(())
}
The return type is Box<Node<Expression>>, so you get both the expression and its span.
Parse a declaration
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let decl = parse::declaration(
"static const unsigned long mask = 0xff;",
Flavor::StdC11,
)?;
println!("{:#?}", decl.node);
Ok::<(), parc::parse::ParseError>(())
}
Declarations are useful when you want to inspect:
- storage class
- type qualifiers
- declarator structure
- initializers
Parse a statement
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let stmt = parse::statement(
"for (int i = 0; i < 4; i++) total += i;",
Flavor::StdC11,
)?;
println!("{:#?}", stmt.node);
Ok::<(), parc::parse::ParseError>(())
}
Parse a whole translation unit
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let source = r#"
typedef int count_t;
count_t inc(count_t x) { return x + 1; }
"#;
let unit = parse::translation_unit(source, Flavor::StdC11)?;
println!("items: {}", unit.0.len());
Ok::<(), parc::parse::ParseError>(())
}
Flavor-sensitive parsing
GNU or Clang syntax only parses when you select a compatible flavor.
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let gnu_expr = "({ int x = 1; x + 2; })";
assert!(parse::expression(gnu_expr, Flavor::GnuC11).is_ok());
assert!(parse::expression(gnu_expr, Flavor::StdC11).is_err());
}
When to prefer parse
Use parse when:
- you already have a string in memory
- you are testing grammar behavior directly
- you are parsing snippets, not full files
- you want a deterministic input without shelling out to
gccorclang
Use driver instead when preprocessing is part of the problem.
AST Model
The ast module contains the syntax tree PARC produces after parsing. Most types track the C11
grammar closely, with additional variants for supported GNU and Clang extensions.
Core wrapper types
Many parsed values are wrapped in Node<T>:
#![allow(unused)]
fn main() {
pub struct Node<T> {
pub node: T,
pub span: Span,
}
}
That means most interesting values come with a byte range in the parsed source.
Top-level structure
The root is:
#![allow(unused)]
fn main() {
pub struct TranslationUnit(pub Vec<Node<ExternalDeclaration>>);
}
Top-level items are:
#![allow(unused)]
fn main() {
pub enum ExternalDeclaration {
Declaration(Node<Declaration>),
StaticAssert(Node<StaticAssert>),
FunctionDefinition(Node<FunctionDefinition>),
}
}
So a translation unit is a flat list of:
- declarations
- static assertions
- function definitions
Declarations
Declarations are split into specifiers and declarators:
#![allow(unused)]
fn main() {
pub struct Declaration {
pub specifiers: Vec<Node<DeclarationSpecifier>>,
pub declarators: Vec<Node<InitDeclarator>>,
}
}
This mirrors C’s real syntax. For example:
static const unsigned long value = 42;
roughly becomes:
- storage class specifier:
Static - type qualifier:
Const - type specifiers:
Unsigned,Long - one declarator with identifier
value - initializer expression
42
Declarators are the hard part
Declarator separates:
- the name-bearing core (
DeclaratorKind) - derived layers such as pointers, arrays, and functions
- extension nodes
That design lets PARC represent C declarators without flattening away their structure.
Examples:
int *p;int values[16];int (*handler)(int);void f(int x, int y);
Expressions
Expression is a large enum covering C expression syntax:
- identifiers
- constants
- string literals
- member access
- calls
- casts
- unary operators
- binary operators
- conditional expressions
- comma expressions
sizeof,_Alignof- GNU statement expressions
offsetofandva_argexpansions
Examples:
x
42
ptr->field
f(a, b)
(int) value
a + b * c
cond ? left : right
({ int t = 1; t + 2; })
Statements
Statement covers:
- labeled statements
- compound blocks
- expression statements
ifswitchwhiledo whileforgotocontinuebreakreturn- GNU asm statements
Blocks contain BlockItem, which can be:
- a declaration
- a static assertion
- another statement
That means a compound statement preserves the declaration/statement distinction instead of erasing everything into one generic node list.
Types and declarator support
Important declaration-side types include:
TypeSpecifierTypeQualifierStorageClassSpecifierFunctionSpecifierAlignmentSpecifierTypeNameDerivedDeclaratorParameterDeclarationInitializerDesignator
This is enough to model:
- pointer chains
- arrays and VLA-like forms
- function parameter lists
- designated initializers
- anonymous and named structs/unions/enums
- typedef names
typeof
Extension nodes
PARC includes explicit AST nodes for extensions instead of hiding them:
Extension::AttributeExtension::AsmLabelExtension::AvailabilityAttributeTypeSpecifier::TypeOfStatement::AsmExpression::Statement
That makes it practical to write tools that either support or reject extension syntax intentionally.
Reading the AST effectively
When working with PARC, a useful order is:
- Start at
TranslationUnit - Split declarations from function definitions
- Inspect declarators carefully for type shape
- Use the visitor API instead of hand-recursing everywhere
- Use
Printerto learn unfamiliar subtrees
Visitor Pattern
The visit module provides recursive AST traversal. It exposes:
- a
Visit<'ast>trait with hook methods - free functions like
visit_expressionandvisit_function_definitionthat recurse into children
The important rule
When you override a method, call the free function from parc::visit, not the trait method on
self. Calling self.visit_* from inside the override will recurse back into your override.
Count function definitions
#![allow(unused)]
fn main() {
use parc::{ast, span, visit};
use parc::visit::Visit;
struct FunctionCounter {
count: usize,
}
impl<'ast> Visit<'ast> for FunctionCounter {
fn visit_function_definition(
&mut self,
node: &'ast ast::FunctionDefinition,
span: &'ast span::Span,
) {
self.count += 1;
visit::visit_function_definition(self, node, span);
}
}
}
Collect identifiers from expressions
#![allow(unused)]
fn main() {
use parc::{ast, span, visit};
use parc::visit::Visit;
struct IdentifierCollector {
names: Vec<String>,
}
impl<'ast> Visit<'ast> for IdentifierCollector {
fn visit_identifier(&mut self, node: &'ast ast::Identifier, span: &'ast span::Span) {
self.names.push(node.name.clone());
visit::visit_identifier(self, node, span);
}
}
}
Use the visitor
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::Visit;
let parsed = parse(&Config::default(), "examples/sample.c")?;
let mut counter = FunctionCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);
println!("functions: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}
When to override which method
- Override
visit_translation_unitfor whole-file summaries - Override
visit_function_definitionfor function-level analysis - Override
visit_declarationfor declaration inspection - Override
visit_expressionfor expression-wide checks - Override narrow hooks like
visit_call_expressionwhen you only care about one form
Traversal style
Two common styles work well:
Pre-order
Do work before recursing:
#![allow(unused)]
fn main() {
fn visit_expression(&mut self, node: &'ast ast::Expression, span: &'ast span::Span) {
self.seen += 1;
visit::visit_expression(self, node, span);
}
}
Selective traversal
Only recurse when the node passes a filter:
#![allow(unused)]
fn main() {
fn visit_statement(&mut self, node: &'ast ast::Statement, span: &'ast span::Span) {
if matches!(node, ast::Statement::Return(_)) {
self.returns += 1;
}
visit::visit_statement(self, node, span);
}
}
Practical advice
- Start with a broad hook like
visit_expressionwhile learning the tree. - Narrow to specific hooks once you understand the shapes you care about.
- Pair the visitor with
Printerwhen a subtree is unclear.
Location Tracking
PARC tracks source positions in two related ways:
Spanstores byte offsets into the parsed inputlocmaps byte offsets in preprocessed source back to original files and lines
Span
Span is a byte range:
#![allow(unused)]
fn main() {
pub struct Span {
pub start: usize,
pub end: usize,
}
}
Most AST values are wrapped in Node<T>, which adds a span field:
#![allow(unused)]
fn main() {
pub struct Node<T> {
pub node: T,
pub span: Span,
}
}
What spans point to
This depends on the API you used:
- with
parse::*, spans refer to the string you passed in - with
driver::parse_preprocessed, spans refer to the preprocessed string you passed in - with
driver::parse, spans refer to the preprocessor output stored inParse::source
That last case is important: spans do not directly point into the original .c file when
preprocessing has inserted line markers or expanded includes.
Mapping offsets back to files
The loc module reads preprocessor line markers like:
# 42 "include/header.h" 1
From those markers, get_location_for_offset reconstructs:
- the active file
- the active line number
- the include stack
Basic example
#![allow(unused)]
fn main() {
use parc::loc::get_location_for_offset;
let src = "# 1 \"main.c\"\nint value;\n";
let (loc, includes) = get_location_for_offset(src, 18);
assert_eq!(loc.file, "main.c");
assert!(includes.is_empty());
}
Using spans with locations
The common pattern is:
- take a node span
- use
span.startorspan.end - map that offset through
loc
Example:
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::loc::get_location_for_offset;
let parsed = parse(&Config::default(), "examples/sample.c")?;
if let Some(first) = parsed.unit.0.first() {
let (loc, include_stack) = get_location_for_offset(&parsed.source, first.span.start);
println!("first item starts in {}:{}", loc.file, loc.line);
println!("include depth: {}", include_stack.len());
}
Ok::<(), parc::driver::Error>(())
}
SyntaxError::get_location
For parser failures in the driver path, driver::SyntaxError already exposes:
#![allow(unused)]
fn main() {
err.get_location()
}
That returns:
- the active source location
- the include chain that led there
This is the best starting point for user-facing diagnostics.
Caveat: byte offsets, not columns in UTF-16
PARC stores Rust byte offsets. That is usually what you want for source processing, but if you are feeding results into another tool that expects a different coordinate system, convert explicitly.
Testing
parc is the source-meaning crate in the toolchain, so its tests should prove
three things:
- the frontend accepts or rejects source as intended
- the extracted
SourcePackagecontract carries the intended meaning - cross-package composition can start from
parcartifacts without relying onparcinternals
PARC has two broad testing layers:
- direct parser/API tests in
src/tests - corpus-style fixtures under
test/reftests/and, when present,test/full_apps/
It also now has explicit grouped failure suites:
failure_matrix_preprocessfor scan/preprocessor hard failures and conservative scan outcomesfailure_matrix_sourcefor source-parse hard failures, resilient recovery, and diagnostic-preserving extraction
Basic commands
The repository Makefile wraps the normal Cargo flow:
make build
make test
Those run:
cargo build --releasecargo test
Hermeticity split
The large PARC surfaces should be read in three groups:
- always-on hermetic baselines
- host-dependent but high-value ladders
- hostile or conservative-failure surfaces
The hermetic baselines should remain the default confidence floor. The host-dependent ladders should strengthen confidence when available. The failure surfaces should prove that PARC stays diagnostic and deterministic when it cannot fully model a header family yet.
Contract tests
Contract tests are the tests a downstream toolchain should treat as the main statement of support:
parse_apitests for direct parser entry points- extraction tests for declaration/source modeling
- scan tests for preprocessing and multi-file source intake
- consumability tests for the
SourcePackageartifact
If one of those changes meaningfully, the corresponding book chapter should change in the same patch.
Parse API tests
src/tests/parse_api.rs checks the public parse entry points directly.
Examples covered in the repository include:
- constants
- expressions
- declarations
- statements
- translation units
This layer is useful when:
- adding a new public parser entry point
- fixing a small grammar regression
- documenting a minimal parsing example
Reference tests
The reftest harness in src/tests/reftests.rs reads files from test/reftests/.
Each case stores:
- the source snippet
- optional
#pragmadirectives that affect parsing - an expected AST printout between
/*===and===*/
That means reftests verify both:
- whether parsing succeeds
- whether the produced tree matches the expected printer output
Reftest update workflow
The harness supports TEST_UPDATE=1 to rewrite expected outputs when printer changes are
intentional.
TEST_UPDATE=1 cargo test reftests
Use that carefully. It is appropriate after deliberate AST or printer changes, not as a substitute for reviewing diffs.
Full-app fixtures
The repository includes a full-app harness in src/tests/full_apps.rs. It supports fixture
directories with a fixture.toml manifest describing:
modeflavorentryexpectedinclude_dirsallow_system_includestags
Supported modes are:
translation_unitdriverpreprocessed
This is the right layer for:
- multi-file examples
- include-path behavior
- external fixture snapshots
- deterministic
.iinputs
Filtering larger fixture runs
The full-app runner supports environment filters:
FULL_APP_FILTER=musl/stdint make test
FULL_APP_TAG=synthetic make test
These are useful when debugging one fixture family instead of running the whole corpus.
Current workspace note
The test harness and README describe test/full_apps, but that directory is not present in this
workspace snapshot. The book documents the supported format because the code and README do.
Extraction tests
src/tests/extraction_fixtures.rs contains fixture-based tests for the extraction pipeline:
typical C patterns (stdio-style, nested structs, typedef chains, function pointers, etc.).
src/extract/mod.rs also contains unit tests for each declaration family.
Hostile header tests
src/tests/hostile_headers.rs covers edge-case and historically problematic C declarations:
deep pointer nesting, anonymous structs/enums, specifier ordering variations, bitfield-only
structs, extreme enum values, forward-then-define patterns, etc.
Recovery tests
src/tests/recovery.rs tests graceful handling of broken, incomplete, or unusual input.
Uses both strict parsing (error expected) and resilient parsing (recovery expected).
Contract tests
src/tests/contract.rs and src/tests/consumability.rs verify that the SourcePackage
contract is sufficient for downstream consumers. These tests cover iteration patterns, type
navigation, serialization, filtering, merging, and programmatic construction.
Differential tests
src/tests/differential.rs documents the known differences between parc extraction and
bic extraction, ensuring behavioral equivalence on standard declarations and explicitly
documenting intentional divergences (pointer model, no ABI fields, typedef chain
preservation).
Multi-file scan tests
src/tests/scan_multifile.rs covers multi-header scanning scenarios: include chains,
multiple entry headers, cross-file struct references, conditional compilation, include
guards, include directory resolution, and metadata population.
Adding new tests
A practical progression is:
- Add a
parse_apiunit test for the exact regression - Add a reftest if you need a stable printed-tree expectation
- Add an extraction test if the issue is about declaration modeling
- Add a scan test if preprocessing or multi-file behavior matters
- Add a full-app fixture if the case needs a full filesystem layout
Cross-crate integration proof
parc library tests should not import linc or gerc.
Cross-crate proof belongs in:
linctests/examples that ingest serialized or translatedparcartifactsgerctests/examples that ingest translated source artifacts- external harnesses that exercise the full toolchain
That keeps parc’s own test suite focused on source meaning while still
proving the larger pipeline elsewhere.
What “supported” means
For parc, support means:
- the syntax path is covered by parser-facing tests
- the extracted source meaning is covered by
SourcePackage-level tests - the relevant limitations are documented honestly when behavior is partial or conservative
It does not mean:
- every downstream consumer will accept the artifact unchanged
- every hostile system header already has perfect preprocessing coverage
- every parser-internal helper is part of the public contract
Diagnostics And Printing
PARC includes two pieces that are especially useful when building tools on top of the parser:
- detailed parse errors
- a tree printer for AST inspection
Direct parser diagnostics
The parse module returns ParseError:
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
match parse::expression("a +", Flavor::StdC11) {
Ok(_) => {}
Err(err) => {
eprintln!("line: {}", err.line);
eprintln!("column: {}", err.column);
eprintln!("offset: {}", err.offset);
eprintln!("expected: {:?}", err.expected);
}
}
}
This is enough for:
- editor error messages
- parser regression tests
- grammar debugging
Driver diagnostics
The driver adds preprocessor context on top:
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Error};
match parse(&Config::default(), "broken.c") {
Ok(_) => {}
Err(Error::PreprocessorError(err)) => {
eprintln!("preprocessor failed: {}", err);
}
Err(Error::SyntaxError(err)) => {
let (loc, includes) = err.get_location();
eprintln!("syntax error in {}:{}:", loc.file, loc.line);
eprintln!("column in preprocessed source: {}", err.column);
for include in includes {
eprintln!("included from {}:{}", include.file, include.line);
}
}
}
}
Formatting expected tokens
driver::SyntaxError also has format_expected, which is useful when building a custom
human-readable error message.
AST printing
print::Printer is a visitor that renders the tree as an indented text dump.
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::print::Printer;
use parc::visit::Visit;
let parsed = parse(&Config::default(), "examples/sample.c")?;
let mut out = String::new();
Printer::new(&mut out).visit_translation_unit(&parsed.unit);
println!("{}", out);
Ok::<(), parc::driver::Error>(())
}
The printer is ideal when:
- learning how PARC models a syntax form
- updating reftests
- debugging traversal code
A practical debugging loop
When a new syntax form is not behaving the way you expect:
- Parse the smallest reproducer with
parse::* - Print the AST with
Printer - Inspect spans on the nodes you care about
- Switch to
driverif preprocessing is involved - Map spans back to original files with
loc
Project Layout
This chapter is for contributors and advanced users who want to understand where the parser logic lives.
Top-level crate layout
The repository is organized around a small public API surface and several internal support modules.
| Path | Purpose |
|---|---|
src/lib.rs | Public module exports |
src/ir/ | Source-level IR (SourcePackage, SourceType, etc.) |
src/extract/ | Declaration extraction from AST to IR |
src/scan/ | Header scanning (preprocess + parse + extract) |
src/intake/ | Preprocessed source intake |
src/driver.rs | File-based parsing via external preprocessing |
src/preprocess/ | Built-in C preprocessor |
src/parse.rs | Direct fragment parsing API |
src/ast/ | AST type definitions |
src/visit/ | Recursive visitor functions and trait |
src/parser/ | Parser implementation split by grammar area |
src/loc.rs | Preprocessor line-marker location mapping |
src/span.rs | Span and Node<T> wrappers |
src/print.rs | AST debug printer |
src/tests/ | Test harnesses and integration-style tests |
AST and visitor organization
The AST is split into focused files:
src/ast/declarations.rssrc/ast/expressions.rssrc/ast/statements.rssrc/ast/extensions.rssrc/ast/lexical.rs
The visitor layer mirrors that structure in src/visit/.
That symmetry is useful:
- if you add a new AST node, you usually need a matching visitor hook
- if you are looking for traversal behavior, the corresponding file is easy to find
Parser organization
The parser implementation is divided by grammar topics instead of one giant file. Examples include:
translation_units_and_functions.rsdeclarations_entry.rsdeclarators.rsstatements_iteration_and_jump.rscasts_and_binary.rstypeof_and_ts18661.rs
That split makes grammar work more localized.
Internal environment handling
Parsing depends on Env, which tracks parser state such as known typedef names and enabled syntax
flavor. The public parse and driver APIs construct the right environment for you.
This matters because some C parses depend on whether an identifier is currently known as a typedef.
Testing layout
src/tests/ contains:
- API tests
- reftest harnesses
- larger fixture harnesses
- external/system-header related coverage
When changing parser behavior, expect to touch both narrow tests and corpus-style fixtures.
Contributor workflow
A good change sequence is:
- reproduce with the smallest possible
parse::*input - add or update a focused test
- inspect the tree with
Printer - patch the grammar or AST logic
- run
make test
Why the parser is split this way
The parser is organized by syntax areas because C grammar work tends to be local but not trivial. That split helps with three things:
- keeping grammar changes reviewable
- matching failures to the right part of the parser quickly
- reducing the chance that one large parser file becomes impossible to maintain
For example:
- declaration bugs often land in
declarations_entry.rs,declarators.rs, or related files - expression bugs often land in
primary_and_generic.rs,casts_and_binary.rs, or nearby files - statement bugs often land in the
statements_*files
Public versus internal boundaries
These are normal consumer-facing modules:
ir(primary data contract)extractscanintakedriverpreprocessparseastvisitlocspanprint
These are implementation-oriented and should not be treated as a stable downstream boundary:
parserenvastutilstrings
That distinction matters when you are extending the book or the crate API. Documentation should prefer the consumer-facing modules unless the chapter is specifically contributor-oriented.
API Contract
This chapter records the intended public consumer surface of parc.
It is not a blanket promise about every future change. It is the current
guidance for how downstream tools should integrate with the crate without
depending on parser internals or accidentally turning parc into a shared ABI
owner for the rest of the pipeline.
First Principle
parc is the source-meaning layer of the pipeline: preprocessing, parsing, and
source-level semantic extraction.
The intended downstream pattern is:
- scan headers or parse source via
driver,scan, orparse - extract normalized declarations via
extract - consume the
SourcePackageIR fromir - use
visit,span, andlocto analyze AST-level details if needed
Downstream consumers that want source contracts should depend on parc::ir,
not on parc::ast directly.
More importantly for this repository:
parclibrary code must not depend onlincorgerclincandgercshould not requireparcas a library dependency in their production code paths- integration should happen through PARC-owned artifacts in tests/examples or external harnesses
- there is no shared ABI crate that all three libraries depend on
- there is no obligation to preserve discarded pipeline shapes for backward compatibility
Preferred public surface
These are the main consumer-facing modules:
| Module | Role | Current expectation |
|---|---|---|
parc::ir | source-level IR (SourcePackage) | preferred data contract |
parc::extract | declaration extraction from AST | preferred extraction entry point |
parc::scan | header scanning (preprocess + extract) | preferred high-level entry point |
parc::intake | preprocessed source intake | preferred for already-preprocessed source |
parc::driver | parse files and preprocessed source | preferred parse entry point |
parc::preprocess | built-in C preprocessor | preferred preprocessing entry point |
parc::parse | parse string fragments directly | preferred low-level entry point |
parc::ast | typed syntax tree | internal data model |
parc::visit | recursive traversal hooks | preferred traversal API |
parc::span | byte-range metadata | preferred location primitive |
parc::loc | map offsets back to files/lines | preferred diagnostics helper |
parc::print | AST debug dumping | preferred inspection helper |
Internal modules are not the contract
These modules are public only indirectly through behavior, not as a recommended downstream surface:
parserenvastutilstrings
If a downstream tool depends directly on how those modules work, it is probably coupling itself to implementation details rather than the intended library boundary.
Normative consumer rules
If you are building on top of parc, the safest current rules are:
- use
driverwhen preprocessing matters - use
parse::*for fragment parsing or already-controlled text inputs - treat
ir::SourcePackageas the primary output contract - use
visitfor traversal instead of hand-rolling recursive descent everywhere - use
spanandlocfor diagnostics rather than guessing source positions - do not rely on exact error-message strings for durable control flow
- do not treat PARC as semantic analysis, type checking, or ABI proof
- if another crate needs PARC output, serialize the PARC-owned artifact and translate it outside library code
What is part of the practical contract
Today the strongest practical contract is:
ir::SourcePackage,SourceType,SourceItem, and all IR types — the primary data contractextract::extract_from_source,extract_from_translation_unit,parse_and_extract,parse_and_extract_resilientscan::ScanConfig,scan_headers,ScanResultintake::PreprocessedInputir::SourcePackageBuilder— programmatic package constructiondriver::Config,Flavor,Parse,Error,SyntaxError,parse_builtin, andcapture_macrospreprocess::{Processor, IncludeResolver, MacroTable, Lexer, preprocess, tokens_to_text, Target, define_target_macros}parse::{constant, expression, declaration, statement, translation_unit, translation_unit_resilient}- the AST model under
ast - the traversal hooks under
visit - the span/location model under
spanandloc
Those are the surfaces the rest of the book assumes consumers will use.
The important correction is this: PARC has two practical contracts today, not one:
- a source-contract path centered on
ir,extract, andscan - a parser-facing path centered on
driver,parse,ast, andvisit
The docs should not pretend the AST side does not exist, because the crate very much exposes it.
What is intentionally weaker
The following should be treated as less stable than the core parsing surface:
- exact debug formatting of AST values
- exact
Displaywording of parse errors - internal parser file layout under
src/parser/ - incidental ordering of implementation helper functions
These details are useful for debugging and contribution work, but they are not the main consumer contract.
Explicit non-goals
The current contract does not promise:
- semantic name resolution beyond parsing decisions such as typedef handling
- type checking
- ABI compatibility guarantees
- full support for every GCC or Clang extension
- preservation of raw macro definitions beyond what
capture_macrosprovides
Those are outside the scope of PARC as a source frontend.
Downstream posture
For long-lived integrations, the safest posture is:
- use
scanorextractas your primary entry point — these produceSourcePackage - consume
ir::SourcePackagerather than raw AST types where possible - use
driverandparseonly when you need AST-level access - treat unsupported syntax and parser errors as normal outcomes
- keep tests with representative preprocessed inputs for the syntax families you depend on
- keep cross-package translation in tests/examples/harnesses rather than adding library dependencies
- see Migration From bic if you are transitioning from
bic
End-To-End Workflows
This chapter ties the public modules together into practical usage patterns.
Workflow 1: Parse A Real C File
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
let parsed = parse(&Config::default(), "include/demo.h")?;
println!("items: {}", parsed.unit.0.len());
Ok::<(), parc::driver::Error>(())
}
This is the baseline path when:
- includes matter
- macros matter
- compiler predefined types or macros matter
The result gives you both the AST and the exact preprocessed source PARC saw.
Workflow 2: Parse A Preprocessed Snapshot
#![allow(unused)]
fn main() {
use parc::driver::{parse_preprocessed, Config};
let source = std::fs::read_to_string("snapshots/demo.i").unwrap();
let parsed = parse_preprocessed(&Config::default(), source)?;
Ok::<(), parc::driver::SyntaxError>(())
}
Use this when:
- reproducing a parse bug
- building deterministic tests
- integrating with a nonstandard build system
This workflow isolates parser behavior from preprocessor invocation behavior.
Workflow 3: Parse A Fragment In Tests
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let decl = parse::declaration("typedef unsigned long word_t;", Flavor::StdC11)?;
let expr = parse::expression("ptr->field + 1", Flavor::GnuC11)?;
Ok::<(), parc::parse::ParseError>(())
}
This is the right workflow for:
- unit tests
- grammar debugging
- editor or language-server experiments
Workflow 4: Build A Syntax Analyzer
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::visit::{self, Visit};
use parc::{ast, span};
struct ReturnCounter {
count: usize,
}
impl<'ast> Visit<'ast> for ReturnCounter {
fn visit_statement(&mut self, node: &'ast ast::Statement, span: &'ast span::Span) {
if matches!(node, ast::Statement::Return(_)) {
self.count += 1;
}
visit::visit_statement(self, node, span);
}
}
let parsed = parse(&Config::default(), "src/main.c")?;
let mut counter = ReturnCounter { count: 0 };
counter.visit_translation_unit(&parsed.unit);
println!("return statements: {}", counter.count);
Ok::<(), parc::driver::Error>(())
}
This is the normal PARC analyzer pattern:
- parse
- traverse
- inspect spans and locations
- emit your own diagnostics or analysis data
Workflow 5: Build Diagnostics With Real File Locations
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config};
use parc::loc::get_location_for_offset;
let parsed = parse(&Config::default(), "src/main.c")?;
for item in &parsed.unit.0 {
let (loc, _) = get_location_for_offset(&parsed.source, item.span.start);
println!("top-level item starts at {}:{}", loc.file, loc.line);
}
Ok::<(), parc::driver::Error>(())
}
Use this when your users care about original file locations rather than raw byte offsets in the preprocessed stream.
Workflow 6: Debug A New Syntax Form
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
use parc::print::Printer;
use parc::visit::Visit;
let expr = parse::expression("({ int x = 1; x + 1; })", Flavor::GnuC11)?;
let mut out = String::new();
Printer::new(&mut out).visit_expression(&expr.node, &expr.span);
println!("{}", out);
Ok::<(), parc::parse::ParseError>(())
}
This is the most effective loop when exploring unfamiliar AST shapes.
Workflow 7: Regression-Test A Parse Failure
A practical bug workflow is:
- capture the smallest failing input
- decide whether preprocessing is relevant
- add a
parse_apitest or a reftest - patch the grammar
- verify the printed AST or error outcome
That keeps parser changes concrete and reviewable.
Error Surface
This chapter describes the error model PARC exposes today.
Two layers of errors
PARC has two main error surfaces:
- direct parser errors from
parse - driver errors from
driver
The distinction is important because the driver includes external preprocessing.
Direct parser errors
The parse module returns:
#![allow(unused)]
fn main() {
Result<T, parc::parse::ParseError>
}
ParseError includes:
linecolumnoffsetexpected
This error means:
- the parser could not consume the full input
- the failure happened at the given position
- one of the listed tokens or grammar expectations would have allowed parsing to continue
Driver errors
The driver module returns:
#![allow(unused)]
fn main() {
Result<parc::driver::Parse, parc::driver::Error>
}
That error enum has two branches:
PreprocessorError(io::Error)SyntaxError(SyntaxError)
This split is a real contract boundary:
- preprocessor failures mean PARC never reached parsing
- syntax failures mean preprocessing succeeded and PARC failed on the resulting text
SyntaxError
driver::SyntaxError contains:
sourcelinecolumnoffsetexpected
It also provides:
get_location()to map back to source files and include stackformat_expected()for user-facing token formatting
What consumers should key on
For durable control flow, consumers should branch on:
- error type
- structured fields such as
line,column, andexpected
Consumers should not branch on:
- exact human-readable
Displaytext - incidental token ordering inside formatted strings
Practical examples
Fragment parsing
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
match parse::statement("if (x) {", Flavor::StdC11) {
Ok(_) => {}
Err(err) => {
eprintln!("statement parse failed at {}:{}", err.line, err.column);
}
}
}
File parsing
#![allow(unused)]
fn main() {
use parc::driver::{parse, Config, Error};
match parse(&Config::default(), "broken.c") {
Ok(_) => {}
Err(Error::PreprocessorError(err)) => {
eprintln!("preprocessor failure: {}", err);
}
Err(Error::SyntaxError(err)) => {
let (loc, includes) = err.get_location();
eprintln!("syntax failure in {}:{} ({})", loc.file, loc.line, err.column);
eprintln!("include depth: {}", includes.len());
}
}
}
Resilient parsing
parse::translation_unit_resilient provides error recovery. When a declaration fails to parse, it
skips to the next synchronization point (; at file scope or } at brace depth zero) and continues
parsing.
#![allow(unused)]
fn main() {
use parc::driver::Flavor;
use parc::parse;
let tu = parse::translation_unit_resilient(source, Flavor::GnuC11);
// tu.0 contains all successfully parsed declarations
// unparseable regions are silently skipped
}
Use this when you want partial results from files that contain unsupported syntax. The strict
translation_unit function is still preferred when you need to detect all errors.
Failure-model guidance
Downstream tools should treat parse failures as normal, reportable outcomes.
That means:
- do not crash just because one translation unit fails
- surface the structured error data to the caller
- retain the preprocessed source when debugging hard failures
Explicit limitations of the current error model
The current model does not provide:
- semantic diagnostics
- fix-it suggestions
- a typed taxonomy for every grammar category of failure
- warning channels separate from parse success
PARC’s errors are syntax-oriented rather than compiler-like.
Flavor And Extension Support
PARC supports three language flavors and several extension families.
This chapter records what that means in practice.
Flavors
| Flavor | Intent |
|---|---|
StdC11 | strict C11 parsing |
GnuC11 | C11 plus GNU-oriented syntax |
ClangC11 | C11 plus Clang-oriented syntax |
Use the flavor that matches the syntax you expect in the input.
Why flavor matters
Some C parses are ambiguous or extension-specific.
Examples include:
- GNU statement expressions
typeof- GCC-style attributes
- GNU asm statements
- Clang availability attributes
If you parse extension-heavy source in StdC11, errors are expected.
GNU-oriented support
The AST and parser explicitly model GNU-oriented syntax such as:
typeof- statement expressions
- GNU asm statements
- asm labels
- attributes
- designated range initializers
In practice, if the source is GCC-flavored or Linux-kernel-like, GnuC11 is usually the right
starting point.
Clang-oriented support
PARC also models Clang-specific or Clang-common syntax including:
- Clang availability attributes
- the
ClangC11flavor path indriverandparse
If your preprocessing and syntax assumptions are built around Clang, use Config::with_clang() or
Flavor::ClangC11.
C23 keyword support
PARC accepts the following C23 keywords in all flavors, because modern compilers (GCC 15+) emit them in preprocessed output by default:
| C23 keyword | C11 equivalent | Notes |
|---|---|---|
bool | _Bool | type specifier |
true | 1 | parsed as integer constant |
false | 0 | parsed as integer constant |
nullptr | 0 | parsed as integer constant |
static_assert | _Static_assert | declaration |
alignas | _Alignas | alignment specifier |
alignof | _Alignof | expression |
thread_local | _Thread_local | storage class |
constexpr | (none) | storage class specifier |
typeof | __typeof__ | type specifier (was GNU-only) |
_BitInt(N) | (none) | type specifier with width |
noreturn | _Noreturn | function specifier |
complex | _Complex | type specifier |
GCC extension types
PARC recognizes these GCC extension types in GNU mode:
| Type | AST variant | Notes |
|---|---|---|
__int128 | TypeSpecifier::Int128 | non-unique, combinable with signed/unsigned |
__float128 | TypeSpecifier::Float128 | unique type specifier |
__builtin_va_list | typedef | handled as built-in typedef name |
Standard-mode guidance
Use StdC11 when:
- you want to reject vendor syntax deliberately
- your test corpus is intended to stay close to the standard
- you want parser behavior that is easier to reason about across compilers
Practical consumer policy
A useful integration policy is:
- default to the compiler family you actually preprocess with
- add tests for the specific extension families you rely on
- treat unsupported extensions as explicit parser limitations, not random bugs
What this chapter does not claim
This chapter does not claim exhaustive support for every extension accepted by GCC or Clang.
It does claim that PARC has explicit support for several important extension families and that the flavor setting is part of the API contract for using them correctly.
Unsupported Cases
This chapter records the important unsupported or intentionally out-of-scope areas.
The goal is to prevent downstream users from mistaking absence of detail for implicit support.
It also acts as the current frontend-family closure ledger. Every hard family should fit into one of these buckets:
- fully supported
- resilient-only support
- diagnostics-only improvement
- intentional rejection
For the Level 1 production claim, this ledger is part of the real contract. If a family is not classified here, it should not be treated as in-scope production behavior.
Frontend-Family Closure Ledger
The current important families are:
| Family | Current state | Notes |
|---|---|---|
| K&R function declarations | diagnostics-only improvement | PARC preserves the function surface and emits explicit unsupported diagnostics. |
| block pointers | intentional rejection | They still fail in parsing; current work is about sharper diagnostics, not pretending they lower cleanly. |
| bitfield-heavy records | resilient-only support | PARC keeps record shape and bit widths, but layout truth remains partial. |
| vendor attributes and calling-convention attributes | resilient-only support | PARC preserves the declaration and emits partial diagnostics when attributes are ignored. |
| macro-heavy include stacks | fully supported on current canonical corpora | The canonical corpora are the proof surface; more corpora still need to land before claiming broad closure. |
| hostile include-order and typedef-chain environments | fully supported on current canonical corpora | Treat this as corpus-backed support, not universal extension parity. |
This ledger is intentionally blunt:
- if a family is not yet honestly representable, reject it
- if a family is only partially representable, say so
- if a family is only proven on named corpora, document that exact scope
The Level 1 production envelope is Linux/ELF-first and corpus-backed. That means “supported” here should be read as one of:
- fully supported within the named canonical corpus
- partially supported with explicit diagnostics
- rejected explicitly as out of scope
Semantic analysis
PARC does not provide:
- full name resolution
- type checking
- constant folding as a stable analysis contract
- ABI or layout proof
- compiler-quality warnings
It is a parser with source-structure support, not a complete compiler frontend.
Preprocessing
PARC does not implement a standalone C preprocessor in the driver path.
Instead it depends on an external preprocessor command such as:
gcc -Eclang -E
That means PARC does not try to normalize every compiler’s preprocessing behavior internally.
The built-in preprocessor is increasingly useful for scan-first workflows, but it is still a scoped compatibility surface rather than a promise of universal host-header parity.
Extension completeness
PARC supports several GNU and Clang extensions, but the project does not promise complete parity with every extension accepted by modern GCC or Clang releases.
Downstream tools should not assume:
- full GNU extension completeness
- full Clang extension completeness
- identical acceptance behavior across all compiler-version-specific syntax edges
Macro inventory and expansion modeling
PARC parses the post-preprocessing result. It does not expose a first-class macro inventory or a stable semantic model of macro definitions as its own output contract.
If you need macro capture as data, that is outside PARC’s current scope.
Translation-unit semantics
PARC can parse translation units, but it does not guarantee:
- cross-file symbol resolution
- duplicate-definition analysis as a stable feature
- semantic correctness of declarations
- linkability of parsed declarations
Those tasks belong to later analysis layers, not the parser itself.
Diagnostics depth
PARC does not currently provide:
- warning classes
- fix-it suggestions
- rich categorized error codes
- a stable diagnostic JSON schema
The current error model is strong enough for syntax handling, not full compiler UX.
The practical rule for the remaining hard families is:
- if PARC can keep a trustworthy declaration surface, it should do so and emit diagnostics
- if PARC cannot keep a trustworthy declaration surface, it should reject the construct explicitly
Consumer guidance
Downstream tools should treat these gaps as explicit non-guarantees.
That means:
- build policy around syntax success and failure, not semantic certainty
- isolate extension-heavy assumptions behind tests
- keep representative preprocessed fixtures for any hard parser dependency
- treat the closure ledger above as part of the real contract, not as a vague future roadmap
Reproducibility
Parsing C is sensitive to the exact preprocessor environment.
This chapter documents how to keep PARC-based workflows reproducible.
Main reproducibility risks
The biggest sources of drift are:
- different preprocessor executables
- different default include paths
- different predefined macros
- different parser flavor settings
- different preprocessed snapshots in tests
Best practices
For durable automation:
- prefer explicit
Configvalues over ambient defaults in CI - pin include paths with
-I...when they matter - use
-nostdincfor isolated fixture testing when appropriate - keep preprocessed snapshots for hard parser regressions
- keep the parser flavor explicit in tests
Deterministic parse debugging
If a real file parse is inconsistent across machines, a strong debugging move is:
- capture the preprocessed output
- switch the failing test to
parse_preprocessed - debug PARC against the stable snapshot
That separates:
- preprocessing differences
- parser differences
Reftests and snapshots
The reftest harness already encourages deterministic expectations by comparing against printed AST
output. For parser bugs that depend on preprocessing, a pinned .i file is often even better.
Consumer guidance
If PARC is part of a larger pipeline, keep the following recorded somewhere durable:
- preprocessor executable
- preprocessor arguments
- flavor
- representative fixtures
- expected parse outcome
Without that context, debugging parser regressions is much slower.
Stable Usage Patterns
This chapter records usage patterns that are safest for downstream consumers.
Pattern 1: Separate parsing from analysis
A durable integration pattern is:
- parse with PARC
- convert the AST into your own analysis model if needed
- run later semantic or policy logic on that model
This avoids coupling too much of your tool to every detail of PARC’s raw AST layout.
Pattern 2: Preserve preprocessed source for diagnostics
If you use driver, keep Parse::source around as long as you may need diagnostics.
That enables:
- mapping spans back to files and lines
- debugging parser failures later
- reproducing failures from stored snapshots
Pattern 3: Make flavor explicit
Even when defaults are convenient, explicit flavor choices are easier to maintain in tools and tests.
Prefer:
Flavor::StdC11for strict grammar testsFlavor::GnuC11when GNU syntax is intentionalFlavor::ClangC11when Clang-specific syntax is intentional
Pattern 4: Test the syntax you depend on
If your downstream tool depends on a specific syntax family, keep representative tests for it.
Examples:
- function-pointer declarators
- designated initializers
- GNU statement expressions
- inline asm
- availability attributes
Pattern 5: Treat parse failure as data
A mature integration does not assume every input will parse. It treats parse failure as a structured, reportable outcome.
That means:
- returning parse diagnostics to the caller
- logging the failing source context when appropriate
- keeping failure fixtures in the test corpus
Pattern 6: Prefer local traversal hooks
When building analyzers, override the narrowest useful visitor hook instead of one huge catch-all traversal method.
That makes the analysis easier to maintain as the AST evolves.
Contributor Workflow
This chapter records a practical workflow for changing parc safely.
Smallest-reproducer rule
When fixing or extending the parser, start with the smallest input that demonstrates the issue.
That input should usually be one of:
- a direct
parse::*snippet - a reftest file
- a preprocessed snapshot
This keeps parser work focused.
Recommended change sequence
- reproduce the issue with the smallest possible input
- decide whether the right test layer is
parse_api, reftest, or full-app style - inspect the AST or failure position with
Printeror structured errors - patch the relevant parser module
- rerun the focused tests
- only then widen out to broader test coverage
Choosing the right test layer
Use parse_api tests when:
- the bug is a simple grammar acceptance issue
- you only need a success/failure assertion
Use reftests when:
- tree shape matters
- printer output is the clearest regression oracle
Use preprocessed or full-app style fixtures when:
- includes or macro expansion are part of the problem
- driver behavior matters
Grammar-oriented debugging
A good parser debugging loop is:
- isolate the failing syntax
- parse with the right flavor
- inspect the closest AST shape that already works
- patch the grammar in the most local parser file possible
This is usually better than broad speculative rewrites.
AST changes
If you add or change an AST node, review the corresponding surfaces too:
- visitor hooks in
visit - printer behavior in
print - any book examples that describe the shape
- reftest expectations if printer output changed
Documentation changes
If a syntax family becomes better supported, update the book at the same time. The important places are usually:
- flavor/extension guidance
- unsupported cases
- workflows
- AST or visitor examples
That keeps the book aligned with the real parser contract.
Boundary rule
When changing parc, keep the ownership split explicit:
parcowns preprocessing, parsing, extraction, and source artifactsparcdoes not own link evidence or Rust lowering- do not document parser internals as if they were a shared ABI for the rest of the pipeline
If a change makes the source artifact richer, document the richer source
meaning directly instead of hinting that downstream crates depend on parc
library internals.
Maintenance rule
The maintenance bar is simple:
- add or tighten the smallest useful test first
- keep public contract docs and examples in the same patch
- prefer deleting stale workflow language over preserving it for history
- do not keep dead compatibility stories in the book
Support Tiers
This chapter records a practical support posture for PARC’s public surface.
It is meant to help downstream users judge which parts of the crate are the safest long-term integration points.
Tier 1: Core Consumer Surface
These are the most important public surfaces to depend on:
driverparseastvisitspanloc
These modules define the main parsing contract of the crate.
Tier 2: Debugging And Inspection Surface
These are public and useful, but more inspection-oriented than contract-critical:
printDebugviews of AST nodes- formatted error text
They are valuable for debugging and tests, but long-lived tooling should still prefer structured data over formatted strings.
Tier 3: Contributor-Oriented Knowledge
These are important for contributors but should not be treated as downstream contracts:
- parser file organization under
src/parser/ - helper-module layout
- incidental internal naming
- current implementation decomposition across grammar files
These details may evolve as the parser changes.
Consumer guidance
If you are building external tooling on top of PARC, bias toward Tier 1 surfaces first. Reach for Tier 2 when you need diagnostics or debugging support. Treat Tier 3 as implementation detail unless you are actively contributing to PARC itself.
Hardening Matrix
This chapter translates the large PARC test surface into an explicit hardening ladder.
The important point is not “how many tests exist”. The important point is which surfaces are carrying confidence for real-header parsing, preprocessing, and source extraction.
How To Read The Matrix
Read each surface on three axes:
- hermetic or host-dependent
- parser-only versus scan-first
- success path versus conservative failure path
A surface is stronger when it is:
- hermetic
- scan-first
- repeated deterministically
- tied to a realistic system or library family
Tier 1: Hermetic Canonical Baselines
These are the first surfaces that should stay green on every machine:
- vendored musl
stdint - vendored zlib
- vendored libpng builtin-preprocessor success path
- repo-owned
macro_env_ahostile macro corpus - repo-owned
type_env_bhostile type corpus - parser and extraction corpus fixtures under
src/tests/**
These matter because they exercise:
- multi-header scanning
- macro and include handling
- extraction into
SourcePackage - deterministic behavior without relying on the host toolchain layout
Tier 2: Host-Dependent Canonical Ladders
These should stay green on developer and CI hosts where the headers exist, but they are not the first portability baseline:
- OpenSSL public wrapper extraction
- combined Linux event-loop wrapper extraction
- larger libc and system-header clusters
These surfaces matter because they are closer to the “real ugly header world” target than the small synthetic fixtures.
Tier 3: Hostile And Conservative-Failure Surfaces
These prove that PARC is refusing or degrading honestly instead of pretending to understand everything:
- hostile declaration fixtures
- repo-owned hostile corpora that force builtin-preprocessor macro and typedef expansion
- recovery fixtures
- unsupported or partial declaration families that still emit diagnostics and partial metadata
- extraction-status summaries that distinguish supported, partial, and unsupported output trust
For release purposes, these failures are good when they are:
- deterministic
- diagnostic
- documented
Determinism Anchors
The most important repeat-run anchors right now are:
- vendored musl scan
- vendored zlib scan
- vendored libpng scan
macro_env_ascantype_env_bscan- OpenSSL wrapper extraction
- combined Linux event-loop wrapper extraction
If any of those become unstable, the release posture should drop immediately.
What This Matrix Does Not Mean
This matrix does not mean:
- every random system library now parses perfectly
- every preprocessor corner is solved
- every large host-dependent surface is equally mature
It means the current confidence ladder is explicit instead of implied.
Parser Boundaries
This chapter explains where PARC starts and where it intentionally stops.
PARC owns syntax parsing
PARC is responsible for:
- accepting supported C syntax
- building an AST
- carrying spans
- mapping parse positions back through preprocessor line markers
That is the core boundary of the crate.
PARC does not own full compilation
PARC does not attempt to be:
- a full preprocessor implementation
- a type checker
- a linker-aware analyzer
- a code generator
- a full semantic compiler frontend
These are not accidental omissions. They are part of the intended scope boundary.
Practical layering
A healthy toolchain boundary looks like this:
- a compiler or preprocessor produces acceptable input
- PARC parses it
- a later layer performs semantic analysis, policy checks, or code generation
This keeps PARC focused on syntax and source structure.
Why this matters for consumers
If a downstream tool needs:
- ABI guarantees
- linker truth
- semantic type equivalence
- macro inventories as data
then PARC should be one component in the pipeline, not the whole pipeline.
Why this matters for contributors
When deciding whether a new feature belongs in PARC, a useful question is:
“Does this improve PARC’s syntax parsing and source-structure contract, or does it drag PARC into a later compiler stage?”
If it is mostly a later-stage concern, it probably belongs outside PARC.
Release Checklist
This chapter is a pragmatic checklist for documentation and parser changes before a release.
The important release posture is architectural:
parcreleases source/frontend behavior- it does not release binary or Rust-generation policy
- the tested
SourcePackagecontract matters more than parser-internal churn
Parser changes
Before releasing parser changes:
- confirm the smallest reproducer has a test
- confirm the intended flavor coverage is tested
- confirm the AST shape change is deliberate
- confirm visitor and printer behavior still make sense
Book changes
Before releasing documentation changes:
- confirm the affected public behavior is described in the book
- confirm unsupported or out-of-scope cases are still documented honestly
- confirm examples still match the actual public API names
Error-surface changes
Before releasing changes around errors:
- confirm structured fields still provide the needed information
- avoid treating formatted strings as the real contract
- update the error-surface chapter if the practical behavior changed
Workflow changes
Before releasing changes to the normal integration path:
- update the workflow chapter
- update the API contract chapter if the preferred boundary changed
- update stable-usage guidance if downstream posture should change
Artifact contract changes
Before releasing a SourcePackage shape change:
- confirm the changed field meaning is covered by contract-level tests
- confirm the consuming workflow examples still describe artifact boundaries
- confirm cross-crate composition is still described as tests/examples/harness work, not library coupling
Release gate
parc is ready to release only when:
make buildpassesmake testpasses- the canonical hardening surfaces are still green
- vendored musl
stdint - vendored zlib
- vendored libpng scan
- OpenSSL public wrapper extraction
- libcurl public wrapper extraction
- combined Linux event-loop wrapper extraction
- vendored musl
- deterministic repeated extraction still holds on the canonical large surfaces
- the book still teaches
parcas the source-meaning crate - unsupported or partial source behavior is still documented honestly
Final practical rule
If a change would force a downstream PARC consumer to rethink how it parses, traverses, or reports on source, the book should say so explicitly in the same change.
Readiness Scorecard
This chapter ties PARC readiness to real suites instead of vague confidence claims.
Overall Posture
PARC should currently be read as:
- strong on parser and extraction fundamentals
- strong on scan-first vendored baselines
- materially stronger on hostile real-world builtin-preprocessor corners
- intentionally conservative when a large header family cannot be modeled honestly
For Level 1 production, PARC should be read as Linux/ELF-first and canonical-corpus-backed, not as a universal frontend for arbitrary C headers.
For whole-pipeline claims, this score is also capped by downstream gerc
anchors that ingest translated PARC source surfaces in tests/examples.
That is good progress, but it is not the same thing as “finished for every C header in the wild”.
Subsystem Scorecard
- parser entrypoints: high
- AST traversal and printing: high
- extraction to
SourcePackage: high - scan-first vendored baselines: high
- hostile-header recovery: medium-high
- built-in preprocessor coverage on ugly system headers: medium-high
- large host-dependent wrapper extraction: medium-high
- deterministic behavior on canonical large surfaces: high
Canonical Readiness Anchors
The release posture should be judged against these anchors first:
- vendored musl
stdint - vendored zlib
- vendored libpng scan
- repo-owned
macro_env_a - repo-owned
type_env_b - OpenSSL public wrapper extraction
- combined Linux event-loop wrapper extraction
If those anchors stay green and deterministic, PARC is earning trust. If they drift, the scorecard should be lowered even if many smaller tests still pass.
What Would Raise Readiness Further
The next meaningful gains would be:
- broader built-in-preprocessor coverage on other hostile width and platform gates beyond the libpng family
- more ugly combined system-header clusters
- more repeat-run deterministic scans on large host-dependent surfaces
- clearer unsupported-case diagnostics for the remaining difficult families
Migration From bic
This chapter documents how to migrate downstream consumers from bic’s frontend extraction
to parc’s SourcePackage contract.
Why migrate
parc now owns source-level declaration extraction. bic’s extract.rs was the legacy location
for this logic. The canonical path is now:
C headers -> parc::scan / parc::extract -> SourcePackage -> downstream
bic should consume parc::ir::SourcePackage instead of owning its own extraction.
Type mapping
| bic type | parc type | Notes |
|---|---|---|
BindingPackage | SourcePackage | parc has no layouts, link, or bic_version |
BindingItem | SourceItem | Same variant set |
BindingType | SourceType | Pointer model differs (see below) |
FunctionBinding | SourceFunction | Identical structure |
ParameterBinding | SourceParameter | Identical structure |
RecordBinding | SourceRecord | No representation or abi_confidence |
FieldBinding | SourceField | No layout field |
EnumBinding | SourceEnum | Identical structure |
TypeAliasBinding | SourceTypeAlias | No canonical_resolution |
VariableBinding | SourceVariable | Identical structure |
UnsupportedItem | SourceUnsupported | Identical structure |
CallingConvention | CallingConvention | parc version includes Unknown(String) |
TypeQualifiers | TypeQualifiers | Identical structure |
BindingTarget | SourceTarget | Identical structure |
BindingInputs | SourceInputs | Identical structure |
BindingDefine | SourceDefine | Identical structure |
MacroBinding | SourceMacro | parc drops function_like and category |
DeclarationProvenance | DeclarationProvenance | Identical structure |
MacroProvenance | MacroProvenance | Identical structure |
Pointer model difference
bic:
#![allow(unused)]
fn main() {
Pointer {
pointee: Box<BindingType>,
const_pointee: bool, // whether pointee is const
qualifiers: TypeQualifiers, // qualifiers on the pointer itself
}
}
parc:
#![allow(unused)]
fn main() {
Pointer {
pointee: Box<SourceType>,
qualifiers: TypeQualifiers, // is_const means pointee is const
}
}
In parc, qualifiers.is_const on a Pointer indicates that the pointee is const-qualified.
Use SourceType::const_ptr(inner) and SourceType::ptr(inner) as constructors.
Missing fields in parc
These bic fields are intentionally absent from parc because they belong to the link/ABI layer:
FieldBinding.layout(field offset) — use LINC probingRecordBinding.representation— use LINC probingRecordBinding.abi_confidence— use LINC validationTypeAliasBinding.canonical_resolution— parc preservesTypedefRefchainsBindingPackage.layouts— use LINC probingBindingPackage.link— use LINC link surfaceBindingPackage.effective_macro_environment— use LINC macro analysis
Migration steps
Step 1: Replace extraction call
Before:
#![allow(unused)]
fn main() {
use bic::extract::Extractor;
use bic::ir::BindingPackage;
let extractor = Extractor::new();
let (items, diagnostics) = extractor.extract(&unit);
let mut pkg = BindingPackage::new();
pkg.items = items;
}
After:
#![allow(unused)]
fn main() {
use parc::extract;
use parc::ir::SourcePackage;
let pkg = extract::extract_from_translation_unit(&unit, Some("header.h".into()));
}
Or for end-to-end scanning:
#![allow(unused)]
fn main() {
use parc::scan::{ScanConfig, scan_headers};
let config = ScanConfig::new()
.entry_header("header.h")
.with_builtin_preprocessor();
let result = scan_headers(&config).unwrap();
let pkg: &SourcePackage = &result.package;
}
Step 2: Update type references
Replace all uses of BindingType with SourceType, BindingItem with SourceItem, etc.
The variant names are identical.
Step 3: Handle pointer model
Replace const_pointee checks:
#![allow(unused)]
fn main() {
// Before (bic)
if let BindingType::Pointer { const_pointee: true, .. } = ty { ... }
// After (parc)
if let SourceType::Pointer { qualifiers, .. } = ty {
if qualifiers.is_const { ... }
}
}
Step 4: Remove ABI fields
Any code that reads FieldBinding.layout, RecordBinding.representation, or
RecordBinding.abi_confidence should be moved to LINC’s domain.
Step 5: Use builder for programmatic construction
#![allow(unused)]
fn main() {
use parc::ir::{SourcePackageBuilder, SourceItem, SourceFunction, ...};
let pkg = SourcePackageBuilder::new()
.source_path("api.h")
.item(SourceItem::Function(func))
.item(SourceItem::Record(rec))
.build();
}
API reference
Key public APIs for downstream consumers:
parc::extract::extract_from_source(src)— parse and extractparc::extract::extract_from_translation_unit(unit, path)— extract from ASTparc::extract::parse_and_extract(src, flavor)— with flavor controlparc::extract::parse_and_extract_resilient(src, flavor)— with error recoveryparc::scan::scan_headers(config)— end-to-end header scanningparc::ir::SourcePackage— the contract typeparc::ir::SourcePackageBuilder— programmatic constructionSourcePackage::retain_items(pred)— filter itemsSourcePackage::merge(other)— combine packages