Package 'ojoregex' reference manual

Title:	Regex Categorization Tools for OJO Analysts
Description:	This package houses all the regex strings we use in our work. The main functionality cleaning charge descriptions from raw OSCN data.
Authors:	Andrew Bell [aut, cre], Brancen Gregory [aut]
Maintainer:	Andrew Bell <[email protected]>
License:	GPL (>= 3)
Version:	0.10.1
Built:	2026-05-26 07:40:36 UTC
Source:	https://github.com/openjusticeok/ojoregex

Add controlling charges to the dataset

Description

This function processes a dataset of charges and adds columns to classify the maximum sentence types and calculate control ranks based on the severity of the sentences.

Usage

ojo_add_controlling_charges(ojo_regex_cats)
ojo_add_controlling_charges(ojo_regex_cats)

Arguments

ojo_regex_cats

A data frame containing charge information, including columns max_sentence_first_offense and max_sentence_any.

Value

A data frame with additional columns for sentence classifications and control ranks.

Apply OJO Regex

Description

This function applies regular expressions patterns to clean and categorize charge descriptions in a given dataset.

Usage

ojo_apply_regex(
  data,
  col_to_clean = "count_as_filed",
  .keep_flags = FALSE,
  .include_cats = TRUE,
  .quiet = FALSE
)
ojo_apply_regex(
  data,
  col_to_clean = "count_as_filed",
  .keep_flags = FALSE,
  .include_cats = TRUE,
  .quiet = FALSE
)

Arguments

data

A data frame containing the dataset to be processed.

col_to_clean

The name of the column in the dataset containing the charge descriptions to be cleaned and categorized.

.keep_flags

Logical value indicating whether to keep the concept flags generated during processing. Defaults to FALSE, which returns only the cleaned dataset without the flags.

.include_cats

Logical value indiciating whether the categories / subcategories should be included in the returned data

.quiet

Should the progress bar be shown?

Value

A cleaned and categorized dataset with charge descriptions in the specified column, along with any additional columns present in the original dataset.

Examples

## Not run: 
# Load example dataset
data(example_data)

# Apply OJO Regex to clean and categorize charge descriptions
cleaned_data <- apply_ojo_regex(data = example_data, col_to_clean = "charge_description")

## End(Not run)
## Not run: 
# Load example dataset
data(example_data)

# Apply OJO Regex to clean and categorize charge descriptions
cleaned_data <- apply_ojo_regex(data = example_data, col_to_clean = "charge_description")

## End(Not run)

Return OJO Regex for a given flag

Description

This function returns the regex string for a given flag

Usage

ojo_get_flag_regex(flag = NA)
ojo_get_flag_regex(flag = NA)

Arguments

flag

The flag you want to get the regex pattern for

Value

A string of regex

Return OJO Regex for a given statute

Description

This function returns the regex string for a given statute

Usage

ojo_get_statute_regex(statute = NA)
ojo_get_statute_regex(statute = NA)

Arguments

statute

The statute you want to get the regex pattern for

Value

A string of regex

OJO Regex Categories dataset

Description

OJO Regex Categories dataset

Usage

ojo_regex_cats
ojo_regex_cats

Format

A data frame with X rows and 16 columns:

in_ojoregex: Description of in_ojoregex
clean_charge_description: Description of clean_charge_description
category: Description of category
subcategory: Description of subcategory
title: Description of title
statutes: Description of statutes
chapter: Description of chapter
gist: Description of gist
description: Description of description
statute_link: Description of statute_link
cf_cm: Description of cf_cm
cf_cm_notes: Description of cf_cm_notes
control_rank: Description of control_rank
max_sentence_any: Description of max_sentence_any
max_sentence_first_offense: Description of max_sentence_first_offense
outdated: Description of outdated
notes: Description of notes
sq780_status: Description of sq780_status
violent_crimes_list: Description of violent_crimes_list

OJO Regex Flags Dataset

Description

OJO Regex Flags Dataset

Usage

ojo_regex_flags
ojo_regex_flags

Format

A data frame with 139 rows and 8 columns:

flag: Description of column1
regex: Description of column2
group: Description of column3
criminal_or_civil: Description of column4
genre: Description of column5
word_boundary: Description of column6
examples: Description of column7
notes: Description of column8

...

Unhoused / Homeless address regex

Description

Detects whether an address is likely to indicate "homeless" / "unhoused" / "NA", etc.

Usage

ojo_regex_unhoused
ojo_regex_unhoused

Format

An object of class character of length 1.

Pre clean charge descriptions to be matched

Description

This function pre-cleans charge descriptions to be matched by removing specific patterns that are not relevant for matching. It removes phrases like "in concert with" from the end of the charge descriptions.

Usage

regex_pre_clean(count_as_filed)
regex_pre_clean(count_as_filed)

Arguments

count_as_filed

A character vector containing the charge descriptions to be pre-cleaned.

Value

A character vector with pre-cleaned charge descriptions.

Examples

## Not run: 
# Example usage
clean_text <- regex_pre_clean("TAXS, FAIL TO DISPLAY TAX STAMP ON CDS IN CONCERT W/J POOLE")
clean_text

## End(Not run)
## Not run: 
# Example usage
clean_text <- regex_pre_clean("TAXS, FAIL TO DISPLAY TAX STAMP ON CDS IN CONCERT W/J POOLE")
clean_text

## End(Not run)

Package 'ojoregex'

Help Index

Add controlling charges to the dataset

Description

Usage

Arguments

Value

Apply OJO Regex

Description

Usage

Arguments

Value

Examples

Return OJO Regex for a given flag

Description

Usage

Arguments

Value

Return OJO Regex for a given statute

Description

Usage

Arguments

Value

OJO Regex Categories dataset

Description

Usage

Format

OJO Regex Flags Dataset

Description

Usage

Format

Unhoused / Homeless address regex

Description

Usage

Format

Pre clean charge descriptions to be matched

Description

Usage

Arguments

Value

Examples