access_eval.analysis package¶

Submodules¶

access_eval.analysis.communication module¶

access_eval.analysis.communication.generate_email_text(head_dir: str | Path) → Path[source]¶

Generate email text from data found in the provided directory.

Parameters:: head_dir (Union[str, Path]) – The directory with all results.
Returns:: email_text – Path to text file containing suggested email message.
Return type:: Path

access_eval.analysis.constants module¶

class access_eval.analysis.constants.ComputedField(name, func)[source]¶

Bases: NamedTuple

Create new instance of ComputedField(name, func)

func: Callable¶

name: str¶

class access_eval.analysis.constants.ComputedFields[source]¶

Bases: object

avg_critical_errors_per_page_post = ComputedField(name='avg_critical_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶

avg_critical_errors_per_page_pre = ComputedField(name='avg_critical_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶

avg_errors_per_page_post = ComputedField(name='avg_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶

avg_errors_per_page_pre = ComputedField(name='avg_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶

avg_minor_errors_per_page_post = ComputedField(name='avg_minor_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶

avg_minor_errors_per_page_pre = ComputedField(name='avg_minor_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶

avg_moderate_errors_per_page_post = ComputedField(name='avg_moderate_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶

avg_moderate_errors_per_page_pre = ComputedField(name='avg_moderate_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶

avg_number_of_words_per_page = ComputedField(name='avg_number_of_words_per_page', func=<function ComputedFields.<lambda>>)¶

avg_serious_errors_per_page_post = ComputedField(name='avg_serious_errors_per_page_post', func=<function ComputedFields.<lambda>>)¶

avg_serious_errors_per_page_pre = ComputedField(name='avg_serious_errors_per_page_pre', func=<function ComputedFields.<lambda>>)¶

diff_critical_errors = ComputedField(name='diff_critical_errors', func=<function ComputedFields.<lambda>>)¶

diff_errors = ComputedField(name='diff_errors', func=<function ComputedFields.<lambda>>)¶

diff_minor_errors = ComputedField(name='diff_minor_errors', func=<function ComputedFields.<lambda>>)¶

diff_moderate_errors = ComputedField(name='diff_moderate_errors', func=<function ComputedFields.<lambda>>)¶

diff_pages = ComputedField(name='diff_pages', func=<function ComputedFields.<lambda>>)¶

diff_serious_errors = ComputedField(name='diff_serious_errors', func=<function ComputedFields.<lambda>>)¶

vote_share_per_critical_error = ComputedField(name='vote_share_per_critical_error', func=<function ComputedFields.<lambda>>)¶

vote_share_per_error = ComputedField(name='vote_share_per_error', func=<function ComputedFields.<lambda>>)¶

vote_share_per_minor_error = ComputedField(name='vote_share_per_minor_error', func=<function ComputedFields.<lambda>>)¶

vote_share_per_moderate_error = ComputedField(name='vote_share_per_moderate_error', func=<function ComputedFields.<lambda>>)¶

vote_share_per_serious_error = ComputedField(name='vote_share_per_serious_error', func=<function ComputedFields.<lambda>>)¶

class access_eval.analysis.constants.DatasetFields[source]¶

Bases: object

This class stores all of the headers for the analysis dataset.

Each header will have a description and some examples. Use this class as a data dictionary.

campaign_website_url = 'campaign_website_url'¶

The public URL for the campaign website.

Examples

“https://www.google.com”
“https://evamaxfield.github.io”

Type:: str

candidate_funding = 'candidate_funding'¶

The amount of money the candidate received in donations during the campaign.

Examples

100000.00
350000.00

Notes

Calculated as sum of all other candidates funding in same race.

Pulled from external data. (Not all candidates had websites scraped scraped)

Type:: float

candidate_history = 'candidate_history'¶

Categorical value for the electoral history of the candidate.

Examples

“In-Office”
“Previously-Elected”
“Never-Held-Office”

Notes

Pulled from external data source.

Type:: str

candidate_position = 'candidate_position'¶

Categorical value for if the candidate is the incumbent, a challenger, or open.

Examples

“Incumbent”
“Challenger”
“Open”

Type:: str

contacted = 'contacted'¶

Was the campaign contacted with the aXe evaluation summarization.

Examples

“Contacted”
“Not-Contacted”

Notes

If the campaign was not contacted, the values for pre and post features are set to equal.

Type:: str

ease_of_reading = 'ease_of_reading'¶

The lexical complexity of the entire website. Calculated on the latest version of the website.

See: https://github.com/shivam5992/textstat#the-flesch-reading-ease-formula for more information.

Examples

123.45
-12.34

Type:: float

election_result = 'election_result'¶

Categorical value for is the candidate won (or progressed) or not.

Examples

“Won”
“Lost”

Notes

Pulled from external data source.

Type:: str

election_type = 'election_type'¶

Categorical value for the type of election.

Examples

“Primary”
“General”
“Runoff”

Type:: str

electoral_position = 'electoral_position'¶

The position the candidate was running for.

Examples

“Mayor”
“Council”

Type:: str

eligible_voting_population = 'eligible_voting_population'¶

The total number of people eligible to vote in the election.

Examples

123456
24680

Notes

Pulled from external data source.

Type:: int

error_type_x = 'error_type_x'¶

There are many columns that begin with ‘error-type_’. Such columns are just the aggregate value of that error type X for that campaign.

Examples

“error-type_label_pre”: 12
“error-type_frame-title_post”: 4

Notes

These columns have a computed field as well which is the avg_error-type_x for both pre and post.

Type:: int

funding_share = 'funding_share'¶

The amount of money the candidate received in donations over the amount of money all candidates received during the campaign.

Examples

0.21
0.47

Type:: float

location = 'location'¶

The municipality or general location where the election took place.

Examples

“Seattle, WA”
“New Orleans, LA”

Type:: str

number_of_critical_errors_post = 'number_of_critical_errors_post'¶

The number of errors categorized as “critical” by aXe for the entire website after contact.

Examples

123
42

Type:: int

number_of_critical_errors_pre = 'number_of_critical_errors_pre'¶

The number of errors categorized as “critical” by aXe for the entire website before contact.

Examples

123
42

Type:: int

number_of_minor_errors_post = 'number_of_minor_errors_post'¶

The number of errors categorized as “minor” by aXe for the entire website after contact.

Examples

123
42

Type:: int

number_of_minor_errors_pre = 'number_of_minor_errors_pre'¶

The number of errors categorized as “minor” by aXe for the entire website before contact.

Examples

123
42

Type:: int

number_of_moderate_errors_post = 'number_of_moderate_errors_post'¶

The number of errors categorized as “moderate” by aXe for the entire website after contact.

Examples

123
42

Type:: int

number_of_moderate_errors_pre = 'number_of_moderate_errors_pre'¶

The number of errors categorized as “moderate” by aXe for the entire website before contact.

Examples

123
42

Type:: int

number_of_pages_post = 'number_of_pages_post'¶

The total number of pages found in the whole campaign website after contact.

Examples

12
42

Type:: int

number_of_pages_pre = 'number_of_pages_pre'¶

The total number of pages found in the whole campaign website before contact.

Examples

12
42

Type:: int

number_of_serious_errors_post = 'number_of_serious_errors_post'¶

The number of errors categorized as “serious” by aXe for the entire website after contact.

Examples

123
42

Type:: int

number_of_serious_errors_pre = 'number_of_serious_errors_pre'¶

The number of errors categorized as “serious” by aXe for the entire website before contact.

Examples

123
42

Type:: int

number_of_total_errors_post = 'number_of_total_errors_post'¶

The total number of errors for the entire website after contact.

Examples

234
450

Type:: int

number_of_total_errors_pre = 'number_of_total_errors_pre'¶

The total number of errors for the entire website before contact.

Examples

234
450

Type:: int

number_of_unique_words = 'number_of_unique_words'¶

The total number of unique words found in the whole campaign website. Calculated on the latest version of the website.

Examples

999
1234

Type:: int

number_of_votes_for_candidate = 'number_of_votes_for_candidate'¶

The number of votes the candidate ultimately received.

Examples

12345
2468

Notes

Pulled from external data source.

Type:: int

number_of_votes_for_race = 'number_of_votes_for_race'¶

The total number of votes returned in the election.

Examples

123456
24680

Notes

Pulled from external data source.

Type:: int

number_of_words = 'number_of_words'¶

The total number of words found in the whole campaign website. Calculated on the latest version of the website.

Examples

9999
12345

Type:: int

race_funding = 'race_funding'¶

The amount of money all candidates in the race received during the campaign.

Examples

10000000.00
24500000.00

Notes

Pulled from external data source.

Type:: float

trial = 'trial'¶

The categorical variable added when the data has been flattened from “pre” and “post” having independent columns to now sharing columns.

Examples

“Pre”
“Post”

Notes

This is only added with the flattened data.

Type:: str

vote_share = 'vote_share'¶

The number of votes the candidate received over the number of votes possible.

Examples

0.21
0.47

Type:: float

access_eval.analysis.core module¶

class access_eval.analysis.core.CompiledMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, number_of_words: int = 0, number_of_unique_words: int = 0, ease_of_reading: float = 0.0, error_types: Dict[str, int] | None = None)[source]¶

Bases: object

critical_violations: int = 0¶

ease_of_reading: float = 0.0¶

error_types: Dict[str, int] | None = None¶

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A¶

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A¶

minor_violations: int = 0¶

moderate_violations: int = 0¶

number_of_unique_words: int = 0¶

number_of_words: int = 0¶

pages: int = 0¶

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]¶

serious_violations: int = 0¶

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]¶

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str¶

class access_eval.analysis.core.RunningMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, word_metrics: Dict[str, access_eval.analysis.core.WordMetric | None] | None = None)[source]¶

Bases: object

critical_violations: int = 0¶

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A¶

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A¶

minor_violations: int = 0¶

moderate_violations: int = 0¶

pages: int = 0¶

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]¶

serious_violations: int = 0¶

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]¶

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str¶

word_metrics: Dict[str, WordMetric | None] | None = None¶

class access_eval.analysis.core.WordMetric(words: int, unique_words: Set[str], ease_of_reading: float)[source]¶

Bases: object

ease_of_reading: float¶

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A¶

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A¶

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]¶

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]¶

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str¶

unique_words: Set[str]¶

words: int¶

access_eval.analysis.core.combine_election_data_with_axe_results(election_data: str | Path | DataFrame, pre_contact_axe_scraping_results: str | Path, post_contact_axe_scraping_results: str | Path) → DataFrame[source]¶

Combine election data CSV (or in memory DataFrame) with the axe results for each campaign website.

Parameters:

election_data (Union[str, Path, pd.DataFrame]) – The path to, or the in-memory dataframe, containing basic election data. This CSV or dataframe should contain a column “campaign_website_url” that can be used to find the associated directory of axe results for that campaigns website.
pre_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.
post_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.

Returns:

full_data – The original election data, the summed violation counts for both pre and post contact, and the scraped text features using the post-contact aXe URLs for each campaign website combined into a single dataframe.

Return type:

pd.DataFrame

Notes

For both the *_axe_scraping_results parameters, provide the parent directory of all individual campaign axe scraping result directories.

I.e. if the data is stored like so: |- pre-data/

|- site-a/ |- site-b/

|- post-data/: |- site-a/ |- site-b/

Provide the parameters as “pre-data/” and “post-data/” respectively.

Additionally, if the provided campaign website url is missing from either the pre or post axe results directories, the site is skipped / dropped from the expanded dataset.

Finally, any https:// or http:// is dropped from the campaign url. I.e. in the spreadsheet the value is https://website.org but the associated directory should be: pre-data/website.org

access_eval.analysis.core.flatten_access_eval_2021_dataset(data: DataFrame | None = None) → DataFrame[source]¶

Flatten the access eval 2021 dataset by adding a new column called “Trial” which stores a categorical value for “Pre” or “Post” which allows us to simplify the columns into just “avg_errors_per_page” for example instead of having both “avg_errors_per_page_pre” and “avg_errors_per_page_post”.

Parameters:: data (pd.DataFrame) – Preloaded access eval data. Default: None (load access eval 2021 data)
Returns:: flattened – The flattened dataset.
Return type:: pd.DataFrame

Notes

This only provides a subset of the full dataset back. Notably dropping the “diff” computed fields.

access_eval.analysis.core.get_crucial_stats(data: DataFrame | None = None) → Dict[str, Any][source]¶

Generate statistics we found useful in the 2021 paper.

This includes: * mayoral vs council campaigns by content features. * percent of total errors per each error severity level * majority of ease of reading range * ordered most common error types * winning vs losing campaigns by content features * winning vs losing campaigns by average errors by page

access_eval.analysis.core.load_access_eval_2021_dataset(path: str | Path | None = None) → DataFrame[source]¶

Load the default access eval 2021 dataset or a provided custom dataset and add all computed fields.

Parameters:: path (Optional[Union[str, Path]]) – An optional path for custom data to load. Default: None (load official 2021 access eval dataset)
Returns:: data – The loaded dataframe object with all extra computed fields added.
Return type:: pd.DataFrame

access_eval.analysis.core.process_axe_evaluations_and_extras(axe_results_dir: str | Path, generate_extras: bool = False) → CompiledMetrics[source]¶

Process all aXe evaluations and generate extra features (words, ease of reading, etc.) for the provided aXe result tree. Extras are optional to generate.

Parameters:

axe_results_dir (Union[str, Path]) – The directory for a specific website that has been processed using the access eval scraper.
generate_extras (bool) – Should the extra features be generated? Default: False (do not generate extra features)

Returns:

metrics – The counts of all violation levels summed for the whole axe results tree (and optional extra features).

Return type:

CompiledMetrics

access_eval.analysis.parse_axe_results module¶

class access_eval.analysis.parse_axe_results.AggregateAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_pages_affected: int, number_of_elements_in_violation: int, help_url: str)[source]¶

Bases: object

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A¶

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A¶

help_url: str¶

id: str¶

impact: str¶

impact_score: int¶

number_of_elements_in_violation: int¶

number_of_pages_affected: int¶

reason: str¶

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]¶

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]¶

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str¶

class access_eval.analysis.parse_axe_results.AxeImpact[source]¶

Bases: object

critical: str = 'critical'¶

minor: str = 'minor'¶

moderate: str = 'moderate'¶

serious: str = 'serious'¶

class access_eval.analysis.parse_axe_results.SimplifiedAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_elements_in_violation: int, help_url: str)[source]¶

Bases: object

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A¶

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A¶

help_url: str¶

id: str¶

impact: str¶

impact_score: int¶

number_of_elements_in_violation: int¶

reason: str¶

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]¶

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]¶

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str¶

access_eval.analysis.parse_axe_results.generate_high_level_statistics(head_dir: str | Path) → None[source]¶

Recursive glob of all directories for axe results and generate high level statistics both for single page and whole website.

Parameters:: head_dir (Union[str, Path]) – The directory to start the recursive glob for axe results in.

access_eval.analysis.plotting module¶

access_eval.analysis.plotting.plot_candidate_position_based_summary_stats(data: DataFrame | None = None) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_categorical_against_errors_boxplots(data: DataFrame | None = None) → List[Path][source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_computed_fields_over_vote_share(data: DataFrame | None = None, save_path: str | Path | None = None) → Path[source]¶

access_eval.analysis.plotting.plot_election_result_based_summary_stats(data: DataFrame | None = None) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_electoral_position_based_summary_stats(data: DataFrame | None = None) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_error_types_boxplots(data: DataFrame | None = None) → Path[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_location_based_summary_stats(data: DataFrame | None = None) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_locations_against_errors_boxplots(data: DataFrame | None = None) → Path[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_pre_post_errors(data: DataFrame | None = None) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_pre_post_fields_compare(data: DataFrame | None = None, save_path: str | Path | None = None) → Path[source]¶

access_eval.analysis.plotting.plot_summary_stats(data: DataFrame | None = None, subset_name: str = '', keep_cols: List[str] = [], plot_kwargs: Dict[str, Any] = {}) → None[source]¶: Input data should be the “flattened” dataset.

access_eval.analysis.utils module¶

access_eval.analysis.utils.unpack_data(zipfile: str | Path = PosixPath('/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/access_eval/analysis/data/pre-access-eval-results.zip'), dest: str | Path = PosixPath('unpacked-pre-access-eval-results'), clean: bool = False) → Path[source]¶

Unzips the zipfile to the destination location.

Parameters:

zipfile (Union[str, Path]) – The zipfile to unpack. Default: The 2021 campaign accessibility evaluation pre-contact data.
dest (Union[str, Path]) – The destination to unpack to. Default: The default location for unpacked “pre-contact” data.
clean (bool) – If a directory already exists at the destination location, should the directory be removed entirely before unpacking again. Default: False (raise an error if a directory already exists)

Module contents¶

Analysis package for access-eval.