access_eval.analysis package¶
Submodules¶
access_eval.analysis.communication module¶
- access_eval.analysis.communication.generate_email_text(head_dir: str | Path) Path [source]¶
Generate email text from data found in the provided directory.
- Parameters:
head_dir (Union[str, Path]) – The directory with all results.
- Returns:
email_text – Path to text file containing suggested email message.
- Return type:
Path
access_eval.analysis.constants module¶
- class access_eval.analysis.constants.ComputedField(name, func)[source]¶
Bases:
NamedTuple
Create new instance of ComputedField(name, func)
- func: Callable¶
- name: str¶
- class access_eval.analysis.constants.ComputedFields[source]¶
Bases:
object
- avg_critical_errors_per_page_post = ComputedField(name='avg_critical_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶
- avg_critical_errors_per_page_pre = ComputedField(name='avg_critical_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶
- avg_errors_per_page_post = ComputedField(name='avg_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶
- avg_errors_per_page_pre = ComputedField(name='avg_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶
- avg_minor_errors_per_page_post = ComputedField(name='avg_minor_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶
- avg_minor_errors_per_page_pre = ComputedField(name='avg_minor_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶
- avg_moderate_errors_per_page_post = ComputedField(name='avg_moderate_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶
- avg_moderate_errors_per_page_pre = ComputedField(name='avg_moderate_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶
- avg_number_of_words_per_page = ComputedField(name='avg_number_of_words_per_page', func=<function ComputedFields.<lambda>>)¶
- avg_serious_errors_per_page_post = ComputedField(name='avg_serious_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶
- avg_serious_errors_per_page_pre = ComputedField(name='avg_serious_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶
- diff_critical_errors = ComputedField(name='diff_critical_errors', func=<function ComputedFields.<lambda>>)¶
- diff_errors = ComputedField(name='diff_errors', func=<function ComputedFields.<lambda>>)¶
- diff_minor_errors = ComputedField(name='diff_minor_errors', func=<function ComputedFields.<lambda>>)¶
- diff_moderate_errors = ComputedField(name='diff_moderate_errors', func=<function ComputedFields.<lambda>>)¶
- diff_pages = ComputedField(name='diff_pages', func=<function ComputedFields.<lambda>>)¶
- diff_serious_errors = ComputedField(name='diff_serious_errors', func=<function ComputedFields.<lambda>>)¶
- class access_eval.analysis.constants.DatasetFields[source]¶
Bases:
object
This class stores all of the headers for the analysis dataset.
Each header will have a description and some examples. Use this class as a data dictionary.
- campaign_website_url = 'campaign_website_url'¶
The public URL for the campaign website.
Examples
- Type:
str
- candidate_funding = 'candidate_funding'¶
The amount of money the candidate received in donations during the campaign.
Examples
100000.00
350000.00
Notes
Calculated as sum of all other candidates funding in same race.
Pulled from external data. (Not all candidates had websites scraped scraped)
- Type:
float
- candidate_history = 'candidate_history'¶
Categorical value for the electoral history of the candidate.
Examples
“In-Office”
“Previously-Elected”
“Never-Held-Office”
Notes
Pulled from external data source.
- Type:
str
- candidate_position = 'candidate_position'¶
Categorical value for if the candidate is the incumbent, a challenger, or open.
Examples
“Incumbent”
“Challenger”
“Open”
- Type:
str
- contacted = 'contacted'¶
Was the campaign contacted with the aXe evaluation summarization.
Examples
“Contacted”
“Not-Contacted”
Notes
If the campaign was not contacted, the values for pre and post features are set to equal.
- Type:
str
- ease_of_reading = 'ease_of_reading'¶
The lexical complexity of the entire website. Calculated on the latest version of the website.
See: https://github.com/shivam5992/textstat#the-flesch-reading-ease-formula for more information.
Examples
123.45
-12.34
- Type:
float
- election_result = 'election_result'¶
Categorical value for is the candidate won (or progressed) or not.
Examples
“Won”
“Lost”
Notes
Pulled from external data source.
- Type:
str
- election_type = 'election_type'¶
Categorical value for the type of election.
Examples
“Primary”
“General”
“Runoff”
- Type:
str
- electoral_position = 'electoral_position'¶
The position the candidate was running for.
Examples
“Mayor”
“Council”
- Type:
str
- eligible_voting_population = 'eligible_voting_population'¶
The total number of people eligible to vote in the election.
Examples
123456
24680
Notes
Pulled from external data source.
- Type:
int
- error_type_x = 'error_type_x'¶
There are many columns that begin with ‘error-type_’. Such columns are just the aggregate value of that error type X for that campaign.
Examples
“error-type_label_pre”: 12
“error-type_frame-title_post”: 4
Notes
These columns have a computed field as well which is the avg_error-type_x for both pre and post.
- Type:
int
The amount of money the candidate received in donations over the amount of money all candidates received during the campaign.
Examples
0.21
0.47
- Type:
float
- location = 'location'¶
The municipality or general location where the election took place.
Examples
“Seattle, WA”
“New Orleans, LA”
- Type:
str
- number_of_critical_errors_post = 'number_of_critical_errors_post'¶
The number of errors categorized as “critical” by aXe for the entire website after contact.
Examples
123
42
- Type:
int
- number_of_critical_errors_pre = 'number_of_critical_errors_pre'¶
The number of errors categorized as “critical” by aXe for the entire website before contact.
Examples
123
42
- Type:
int
- number_of_minor_errors_post = 'number_of_minor_errors_post'¶
The number of errors categorized as “minor” by aXe for the entire website after contact.
Examples
123
42
- Type:
int
- number_of_minor_errors_pre = 'number_of_minor_errors_pre'¶
The number of errors categorized as “minor” by aXe for the entire website before contact.
Examples
123
42
- Type:
int
- number_of_moderate_errors_post = 'number_of_moderate_errors_post'¶
The number of errors categorized as “moderate” by aXe for the entire website after contact.
Examples
123
42
- Type:
int
- number_of_moderate_errors_pre = 'number_of_moderate_errors_pre'¶
The number of errors categorized as “moderate” by aXe for the entire website before contact.
Examples
123
42
- Type:
int
- number_of_pages_post = 'number_of_pages_post'¶
The total number of pages found in the whole campaign website after contact.
Examples
12
42
- Type:
int
- number_of_pages_pre = 'number_of_pages_pre'¶
The total number of pages found in the whole campaign website before contact.
Examples
12
42
- Type:
int
- number_of_serious_errors_post = 'number_of_serious_errors_post'¶
The number of errors categorized as “serious” by aXe for the entire website after contact.
Examples
123
42
- Type:
int
- number_of_serious_errors_pre = 'number_of_serious_errors_pre'¶
The number of errors categorized as “serious” by aXe for the entire website before contact.
Examples
123
42
- Type:
int
- number_of_total_errors_post = 'number_of_total_errors_post'¶
The total number of errors for the entire website after contact.
Examples
234
450
- Type:
int
- number_of_total_errors_pre = 'number_of_total_errors_pre'¶
The total number of errors for the entire website before contact.
Examples
234
450
- Type:
int
- number_of_unique_words = 'number_of_unique_words'¶
The total number of unique words found in the whole campaign website. Calculated on the latest version of the website.
Examples
999
1234
- Type:
int
- number_of_votes_for_candidate = 'number_of_votes_for_candidate'¶
The number of votes the candidate ultimately received.
Examples
12345
2468
Notes
Pulled from external data source.
- Type:
int
- number_of_votes_for_race = 'number_of_votes_for_race'¶
The total number of votes returned in the election.
Examples
123456
24680
Notes
Pulled from external data source.
- Type:
int
- number_of_words = 'number_of_words'¶
The total number of words found in the whole campaign website. Calculated on the latest version of the website.
Examples
9999
12345
- Type:
int
- race_funding = 'race_funding'¶
The amount of money all candidates in the race received during the campaign.
Examples
10000000.00
24500000.00
Notes
Pulled from external data source.
- Type:
float
- trial = 'trial'¶
The categorical variable added when the data has been flattened from “pre” and “post” having independent columns to now sharing columns.
Examples
“Pre”
“Post”
Notes
This is only added with the flattened data.
- Type:
str
The number of votes the candidate received over the number of votes possible.
Examples
0.21
0.47
- Type:
float
access_eval.analysis.core module¶
- class access_eval.analysis.core.CompiledMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, number_of_words: int = 0, number_of_unique_words: int = 0, ease_of_reading: float = 0.0, error_types: Dict[str, int] | None = None)[source]¶
Bases:
object
- critical_violations: int = 0¶
- ease_of_reading: float = 0.0¶
- error_types: Dict[str, int] | None = None¶
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A ¶
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A ¶
- minor_violations: int = 0¶
- moderate_violations: int = 0¶
- number_of_unique_words: int = 0¶
- number_of_words: int = 0¶
- pages: int = 0¶
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A] ¶
- serious_violations: int = 0¶
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None] ¶
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str ¶
- class access_eval.analysis.core.RunningMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, word_metrics: Dict[str, access_eval.analysis.core.WordMetric | None] | None = None)[source]¶
Bases:
object
- critical_violations: int = 0¶
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A ¶
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A ¶
- minor_violations: int = 0¶
- moderate_violations: int = 0¶
- pages: int = 0¶
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A] ¶
- serious_violations: int = 0¶
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None] ¶
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str ¶
- word_metrics: Dict[str, WordMetric | None] | None = None¶
- class access_eval.analysis.core.WordMetric(words: int, unique_words: Set[str], ease_of_reading: float)[source]¶
Bases:
object
- ease_of_reading: float¶
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A ¶
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A ¶
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A] ¶
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None] ¶
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str ¶
- unique_words: Set[str]¶
- words: int¶
- access_eval.analysis.core.combine_election_data_with_axe_results(election_data: str | Path | DataFrame, pre_contact_axe_scraping_results: str | Path, post_contact_axe_scraping_results: str | Path) DataFrame [source]¶
Combine election data CSV (or in memory DataFrame) with the axe results for each campaign website.
- Parameters:
election_data (Union[str, Path, pd.DataFrame]) – The path to, or the in-memory dataframe, containing basic election data. This CSV or dataframe should contain a column “campaign_website_url” that can be used to find the associated directory of axe results for that campaigns website.
pre_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.
post_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.
- Returns:
full_data – The original election data, the summed violation counts for both pre and post contact, and the scraped text features using the post-contact aXe URLs for each campaign website combined into a single dataframe.
- Return type:
pd.DataFrame
Notes
For both the *_axe_scraping_results parameters, provide the parent directory of all individual campaign axe scraping result directories.
I.e. if the data is stored like so: |- pre-data/
Provide the parameters as “pre-data/” and “post-data/” respectively.
Additionally, if the provided campaign website url is missing from either the pre or post axe results directories, the site is skipped / dropped from the expanded dataset.
Finally, any https:// or http:// is dropped from the campaign url. I.e. in the spreadsheet the value is https://website.org but the associated directory should be: pre-data/website.org
- access_eval.analysis.core.flatten_access_eval_2021_dataset(data: DataFrame | None = None) DataFrame [source]¶
Flatten the access eval 2021 dataset by adding a new column called “Trial” which stores a categorical value for “Pre” or “Post” which allows us to simplify the columns into just “avg_errors_per_page” for example instead of having both “avg_errors_per_page_pre” and “avg_errors_per_page_post”.
- Parameters:
data (pd.DataFrame) – Preloaded access eval data. Default: None (load access eval 2021 data)
- Returns:
flattened – The flattened dataset.
- Return type:
pd.DataFrame
Notes
This only provides a subset of the full dataset back. Notably dropping the “diff” computed fields.
- access_eval.analysis.core.get_crucial_stats(data: DataFrame | None = None) Dict[str, Any] [source]¶
Generate statistics we found useful in the 2021 paper.
This includes: * mayoral vs council campaigns by content features. * percent of total errors per each error severity level * majority of ease of reading range * ordered most common error types * winning vs losing campaigns by content features * winning vs losing campaigns by average errors by page
- access_eval.analysis.core.load_access_eval_2021_dataset(path: str | Path | None = None) DataFrame [source]¶
Load the default access eval 2021 dataset or a provided custom dataset and add all computed fields.
- Parameters:
path (Optional[Union[str, Path]]) – An optional path for custom data to load. Default: None (load official 2021 access eval dataset)
- Returns:
data – The loaded dataframe object with all extra computed fields added.
- Return type:
pd.DataFrame
- access_eval.analysis.core.process_axe_evaluations_and_extras(axe_results_dir: str | Path, generate_extras: bool = False) CompiledMetrics [source]¶
Process all aXe evaluations and generate extra features (words, ease of reading, etc.) for the provided aXe result tree. Extras are optional to generate.
- Parameters:
axe_results_dir (Union[str, Path]) – The directory for a specific website that has been processed using the access eval scraper.
generate_extras (bool) – Should the extra features be generated? Default: False (do not generate extra features)
- Returns:
metrics – The counts of all violation levels summed for the whole axe results tree (and optional extra features).
- Return type:
access_eval.analysis.parse_axe_results module¶
- class access_eval.analysis.parse_axe_results.AggregateAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_pages_affected: int, number_of_elements_in_violation: int, help_url: str)[source]¶
Bases:
object
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A ¶
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A ¶
- help_url: str¶
- id: str¶
- impact: str¶
- impact_score: int¶
- number_of_elements_in_violation: int¶
- number_of_pages_affected: int¶
- reason: str¶
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A] ¶
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None] ¶
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str ¶
- class access_eval.analysis.parse_axe_results.AxeImpact[source]¶
Bases:
object
- critical: str = 'critical'¶
- minor: str = 'minor'¶
- moderate: str = 'moderate'¶
- serious: str = 'serious'¶
- class access_eval.analysis.parse_axe_results.SimplifiedAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_elements_in_violation: int, help_url: str)[source]¶
Bases:
object
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A ¶
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A ¶
- help_url: str¶
- id: str¶
- impact: str¶
- impact_score: int¶
- number_of_elements_in_violation: int¶
- reason: str¶
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A] ¶
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None] ¶
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str ¶
- access_eval.analysis.parse_axe_results.generate_high_level_statistics(head_dir: str | Path) None [source]¶
Recursive glob of all directories for axe results and generate high level statistics both for single page and whole website.
- Parameters:
head_dir (Union[str, Path]) – The directory to start the recursive glob for axe results in.
access_eval.analysis.plotting module¶
- access_eval.analysis.plotting.plot_candidate_position_based_summary_stats(data: DataFrame | None = None) None [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_categorical_against_errors_boxplots(data: DataFrame | None = None) List[Path] [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_election_result_based_summary_stats(data: DataFrame | None = None) None [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_electoral_position_based_summary_stats(data: DataFrame | None = None) None [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_error_types_boxplots(data: DataFrame | None = None) Path [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_location_based_summary_stats(data: DataFrame | None = None) None [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_locations_against_errors_boxplots(data: DataFrame | None = None) Path [source]¶
Input data should be the “flattened” dataset.
- access_eval.analysis.plotting.plot_pre_post_errors(data: DataFrame | None = None) None [source]¶
Input data should be the “flattened” dataset.
access_eval.analysis.utils module¶
- access_eval.analysis.utils.unpack_data(zipfile: str | Path = PosixPath('/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/access_eval/analysis/data/pre-access-eval-results.zip'), dest: str | Path = PosixPath('unpacked-pre-access-eval-results'), clean: bool = False) Path [source]¶
Unzips the zipfile to the destination location.
- Parameters:
zipfile (Union[str, Path]) – The zipfile to unpack. Default: The 2021 campaign accessibility evaluation pre-contact data.
dest (Union[str, Path]) – The destination to unpack to. Default: The default location for unpacked “pre-contact” data.
clean (bool) – If a directory already exists at the destination location, should the directory be removed entirely before unpacking again. Default: False (raise an error if a directory already exists)
Module contents¶
Analysis package for access-eval.