| Title: | Regex Categorization Tools for OJO Analysts |
|---|---|
| Description: | This package houses all the regex strings we use in our work. The main functionality cleaning charge descriptions from raw OSCN data. |
| Authors: | Andrew Bell [aut, cre], Brancen Gregory [aut] |
| Maintainer: | Andrew Bell <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.10.1 |
| Built: | 2026-05-26 07:40:36 UTC |
| Source: | https://github.com/openjusticeok/ojoregex |
This function processes a dataset of charges and adds columns to classify the maximum sentence types and calculate control ranks based on the severity of the sentences.
ojo_add_controlling_charges(ojo_regex_cats)ojo_add_controlling_charges(ojo_regex_cats)
ojo_regex_cats |
A data frame containing charge information, including
columns |
A data frame with additional columns for sentence classifications and control ranks.
This function applies regular expressions patterns to clean and categorize charge descriptions in a given dataset.
ojo_apply_regex( data, col_to_clean = "count_as_filed", .keep_flags = FALSE, .include_cats = TRUE, .quiet = FALSE )ojo_apply_regex( data, col_to_clean = "count_as_filed", .keep_flags = FALSE, .include_cats = TRUE, .quiet = FALSE )
data |
A data frame containing the dataset to be processed. |
col_to_clean |
The name of the column in the dataset containing the charge descriptions to be cleaned and categorized. |
.keep_flags |
Logical value indicating whether to keep the concept flags generated during processing. Defaults to FALSE, which returns only the cleaned dataset without the flags. |
.include_cats |
Logical value indiciating whether the categories / subcategories should be included in the returned data |
.quiet |
Should the progress bar be shown? |
A cleaned and categorized dataset with charge descriptions in the specified column, along with any additional columns present in the original dataset.
## Not run: # Load example dataset data(example_data) # Apply OJO Regex to clean and categorize charge descriptions cleaned_data <- apply_ojo_regex(data = example_data, col_to_clean = "charge_description") ## End(Not run)## Not run: # Load example dataset data(example_data) # Apply OJO Regex to clean and categorize charge descriptions cleaned_data <- apply_ojo_regex(data = example_data, col_to_clean = "charge_description") ## End(Not run)
This function returns the regex string for a given flag
ojo_get_flag_regex(flag = NA)ojo_get_flag_regex(flag = NA)
flag |
The flag you want to get the regex pattern for |
A string of regex
This function returns the regex string for a given statute
ojo_get_statute_regex(statute = NA)ojo_get_statute_regex(statute = NA)
statute |
The statute you want to get the regex pattern for |
A string of regex
OJO Regex Categories dataset
ojo_regex_catsojo_regex_cats
A data frame with X rows and 16 columns:
Description of in_ojoregex
Description of clean_charge_description
Description of category
Description of subcategory
Description of title
Description of statutes
Description of chapter
Description of gist
Description of description
Description of statute_link
Description of cf_cm
Description of cf_cm_notes
Description of control_rank
Description of max_sentence_any
Description of max_sentence_first_offense
Description of outdated
Description of notes
Description of sq780_status
Description of violent_crimes_list
OJO Regex Flags Dataset
ojo_regex_flagsojo_regex_flags
A data frame with 139 rows and 8 columns:
Description of column1
Description of column2
Description of column3
Description of column4
Description of column5
Description of column6
Description of column7
Description of column8
...
Detects whether an address is likely to indicate "homeless" / "unhoused" / "NA", etc.
ojo_regex_unhousedojo_regex_unhoused
An object of class character of length 1.
This function pre-cleans charge descriptions to be matched by removing specific patterns that are not relevant for matching. It removes phrases like "in concert with" from the end of the charge descriptions.
regex_pre_clean(count_as_filed)regex_pre_clean(count_as_filed)
count_as_filed |
A character vector containing the charge descriptions to be pre-cleaned. |
A character vector with pre-cleaned charge descriptions.
## Not run: # Example usage clean_text <- regex_pre_clean("TAXS, FAIL TO DISPLAY TAX STAMP ON CDS IN CONCERT W/J POOLE") clean_text ## End(Not run)## Not run: # Example usage clean_text <- regex_pre_clean("TAXS, FAIL TO DISPLAY TAX STAMP ON CDS IN CONCERT W/J POOLE") clean_text ## End(Not run)