access_eval.analysis package

Submodules

access_eval.analysis.communication module

access_eval.analysis.communication.generate_email_text(head_dir: str | Path) Path[source]

Generate email text from data found in the provided directory.

Parameters:

head_dir (Union[str, Path]) – The directory with all results.

Returns:

email_text – Path to text file containing suggested email message.

Return type:

Path

access_eval.analysis.constants module

class access_eval.analysis.constants.ComputedField(name, func)[source]

Bases: NamedTuple

Create new instance of ComputedField(name, func)

func: Callable
name: str
class access_eval.analysis.constants.ComputedFields[source]

Bases: object

avg_critical_errors_per_page_post = ComputedField(name='avg_critical_errors_per_page_post', func=<function ComputedFields.<lambda>>)
avg_critical_errors_per_page_pre = ComputedField(name='avg_critical_errors_per_page_pre', func=<function ComputedFields.<lambda>>)
avg_errors_per_page_post = ComputedField(name='avg_errors_per_page_post', func=<function ComputedFields.<lambda>>)
avg_errors_per_page_pre = ComputedField(name='avg_errors_per_page_pre', func=<function ComputedFields.<lambda>>)
avg_minor_errors_per_page_post = ComputedField(name='avg_minor_errors_per_page_post', func=<function ComputedFields.<lambda>>)
avg_minor_errors_per_page_pre = ComputedField(name='avg_minor_errors_per_page_pre', func=<function ComputedFields.<lambda>>)
avg_moderate_errors_per_page_post = ComputedField(name='avg_moderate_errors_per_page_post', func=<function ComputedFields.<lambda>>)
avg_moderate_errors_per_page_pre = ComputedField(name='avg_moderate_errors_per_page_pre', func=<function ComputedFields.<lambda>>)
avg_number_of_words_per_page = ComputedField(name='avg_number_of_words_per_page', func=<function ComputedFields.<lambda>>)
avg_serious_errors_per_page_post = ComputedField(name='avg_serious_errors_per_page_post', func=<function ComputedFields.<lambda>>)
avg_serious_errors_per_page_pre = ComputedField(name='avg_serious_errors_per_page_pre', func=<function ComputedFields.<lambda>>)
diff_critical_errors = ComputedField(name='diff_critical_errors', func=<function ComputedFields.<lambda>>)
diff_errors = ComputedField(name='diff_errors', func=<function ComputedFields.<lambda>>)
diff_minor_errors = ComputedField(name='diff_minor_errors', func=<function ComputedFields.<lambda>>)
diff_moderate_errors = ComputedField(name='diff_moderate_errors', func=<function ComputedFields.<lambda>>)
diff_pages = ComputedField(name='diff_pages', func=<function ComputedFields.<lambda>>)
diff_serious_errors = ComputedField(name='diff_serious_errors', func=<function ComputedFields.<lambda>>)
vote_share_per_critical_error = ComputedField(name='vote_share_per_critical_error', func=<function ComputedFields.<lambda>>)
vote_share_per_error = ComputedField(name='vote_share_per_error', func=<function ComputedFields.<lambda>>)
vote_share_per_minor_error = ComputedField(name='vote_share_per_minor_error', func=<function ComputedFields.<lambda>>)
vote_share_per_moderate_error = ComputedField(name='vote_share_per_moderate_error', func=<function ComputedFields.<lambda>>)
vote_share_per_serious_error = ComputedField(name='vote_share_per_serious_error', func=<function ComputedFields.<lambda>>)
class access_eval.analysis.constants.DatasetFields[source]

Bases: object

This class stores all of the headers for the analysis dataset.

Each header will have a description and some examples. Use this class as a data dictionary.

campaign_website_url = 'campaign_website_url'

The public URL for the campaign website.

Examples

Type:

str

candidate_funding = 'candidate_funding'

The amount of money the candidate received in donations during the campaign.

Examples

  • 100000.00

  • 350000.00

Notes

Calculated as sum of all other candidates funding in same race.

Pulled from external data. (Not all candidates had websites scraped scraped)

Type:

float

candidate_history = 'candidate_history'

Categorical value for the electoral history of the candidate.

Examples

  • “In-Office”

  • “Previously-Elected”

  • “Never-Held-Office”

Notes

Pulled from external data source.

Type:

str

candidate_position = 'candidate_position'

Categorical value for if the candidate is the incumbent, a challenger, or open.

Examples

  • “Incumbent”

  • “Challenger”

  • “Open”

Type:

str

contacted = 'contacted'

Was the campaign contacted with the aXe evaluation summarization.

Examples

  • “Contacted”

  • “Not-Contacted”

Notes

If the campaign was not contacted, the values for pre and post features are set to equal.

Type:

str

ease_of_reading = 'ease_of_reading'

The lexical complexity of the entire website. Calculated on the latest version of the website.

See: https://github.com/shivam5992/textstat#the-flesch-reading-ease-formula for more information.

Examples

  • 123.45

  • -12.34

Type:

float

election_result = 'election_result'

Categorical value for is the candidate won (or progressed) or not.

Examples

  • “Won”

  • “Lost”

Notes

Pulled from external data source.

Type:

str

election_type = 'election_type'

Categorical value for the type of election.

Examples

  • “Primary”

  • “General”

  • “Runoff”

Type:

str

electoral_position = 'electoral_position'

The position the candidate was running for.

Examples

  • “Mayor”

  • “Council”

Type:

str

eligible_voting_population = 'eligible_voting_population'

The total number of people eligible to vote in the election.

Examples

  • 123456

  • 24680

Notes

Pulled from external data source.

Type:

int

error_type_x = 'error_type_x'

There are many columns that begin with ‘error-type_’. Such columns are just the aggregate value of that error type X for that campaign.

Examples

  • “error-type_label_pre”: 12

  • “error-type_frame-title_post”: 4

Notes

These columns have a computed field as well which is the avg_error-type_x for both pre and post.

Type:

int

funding_share = 'funding_share'

The amount of money the candidate received in donations over the amount of money all candidates received during the campaign.

Examples

  • 0.21

  • 0.47

Type:

float

location = 'location'

The municipality or general location where the election took place.

Examples

  • “Seattle, WA”

  • “New Orleans, LA”

Type:

str

number_of_critical_errors_post = 'number_of_critical_errors_post'

The number of errors categorized as “critical” by aXe for the entire website after contact.

Examples

  • 123

  • 42

Type:

int

number_of_critical_errors_pre = 'number_of_critical_errors_pre'

The number of errors categorized as “critical” by aXe for the entire website before contact.

Examples

  • 123

  • 42

Type:

int

number_of_minor_errors_post = 'number_of_minor_errors_post'

The number of errors categorized as “minor” by aXe for the entire website after contact.

Examples

  • 123

  • 42

Type:

int

number_of_minor_errors_pre = 'number_of_minor_errors_pre'

The number of errors categorized as “minor” by aXe for the entire website before contact.

Examples

  • 123

  • 42

Type:

int

number_of_moderate_errors_post = 'number_of_moderate_errors_post'

The number of errors categorized as “moderate” by aXe for the entire website after contact.

Examples

  • 123

  • 42

Type:

int

number_of_moderate_errors_pre = 'number_of_moderate_errors_pre'

The number of errors categorized as “moderate” by aXe for the entire website before contact.

Examples

  • 123

  • 42

Type:

int

number_of_pages_post = 'number_of_pages_post'

The total number of pages found in the whole campaign website after contact.

Examples

  • 12

  • 42

Type:

int

number_of_pages_pre = 'number_of_pages_pre'

The total number of pages found in the whole campaign website before contact.

Examples

  • 12

  • 42

Type:

int

number_of_serious_errors_post = 'number_of_serious_errors_post'

The number of errors categorized as “serious” by aXe for the entire website after contact.

Examples

  • 123

  • 42

Type:

int

number_of_serious_errors_pre = 'number_of_serious_errors_pre'

The number of errors categorized as “serious” by aXe for the entire website before contact.

Examples

  • 123

  • 42

Type:

int

number_of_total_errors_post = 'number_of_total_errors_post'

The total number of errors for the entire website after contact.

Examples

  • 234

  • 450

Type:

int

number_of_total_errors_pre = 'number_of_total_errors_pre'

The total number of errors for the entire website before contact.

Examples

  • 234

  • 450

Type:

int

number_of_unique_words = 'number_of_unique_words'

The total number of unique words found in the whole campaign website. Calculated on the latest version of the website.

Examples

  • 999

  • 1234

Type:

int

number_of_votes_for_candidate = 'number_of_votes_for_candidate'

The number of votes the candidate ultimately received.

Examples

  • 12345

  • 2468

Notes

Pulled from external data source.

Type:

int

number_of_votes_for_race = 'number_of_votes_for_race'

The total number of votes returned in the election.

Examples

  • 123456

  • 24680

Notes

Pulled from external data source.

Type:

int

number_of_words = 'number_of_words'

The total number of words found in the whole campaign website. Calculated on the latest version of the website.

Examples

  • 9999

  • 12345

Type:

int

race_funding = 'race_funding'

The amount of money all candidates in the race received during the campaign.

Examples

  • 10000000.00

  • 24500000.00

Notes

Pulled from external data source.

Type:

float

trial = 'trial'

The categorical variable added when the data has been flattened from “pre” and “post” having independent columns to now sharing columns.

Examples

  • “Pre”

  • “Post”

Notes

This is only added with the flattened data.

Type:

str

vote_share = 'vote_share'

The number of votes the candidate received over the number of votes possible.

Examples

  • 0.21

  • 0.47

Type:

float

access_eval.analysis.core module

class access_eval.analysis.core.CompiledMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, number_of_words: int = 0, number_of_unique_words: int = 0, ease_of_reading: float = 0.0, error_types: Dict[str, int] | None = None)[source]

Bases: object

critical_violations: int = 0
ease_of_reading: float = 0.0
error_types: Dict[str, int] | None = None
classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
minor_violations: int = 0
moderate_violations: int = 0
number_of_unique_words: int = 0
number_of_words: int = 0
pages: int = 0
classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
serious_violations: int = 0
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
class access_eval.analysis.core.RunningMetrics(pages: int = 0, minor_violations: int = 0, moderate_violations: int = 0, serious_violations: int = 0, critical_violations: int = 0, word_metrics: Dict[str, access_eval.analysis.core.WordMetric | None] | None = None)[source]

Bases: object

critical_violations: int = 0
classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
minor_violations: int = 0
moderate_violations: int = 0
pages: int = 0
classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
serious_violations: int = 0
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
word_metrics: Dict[str, WordMetric | None] | None = None
class access_eval.analysis.core.WordMetric(words: int, unique_words: Set[str], ease_of_reading: float)[source]

Bases: object

ease_of_reading: float
classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
unique_words: Set[str]
words: int
access_eval.analysis.core.combine_election_data_with_axe_results(election_data: str | Path | DataFrame, pre_contact_axe_scraping_results: str | Path, post_contact_axe_scraping_results: str | Path) DataFrame[source]

Combine election data CSV (or in memory DataFrame) with the axe results for each campaign website.

Parameters:
  • election_data (Union[str, Path, pd.DataFrame]) – The path to, or the in-memory dataframe, containing basic election data. This CSV or dataframe should contain a column “campaign_website_url” that can be used to find the associated directory of axe results for that campaigns website.

  • pre_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.

  • post_contact_axe_scraping_results (Union[str, Path]) – The path to the directory that contains sub-directories for each campaign website’s axe results. I.e. data/site-a and data/site-b, provide the directory “data” as both “site-a” and “site-b” are direct children.

Returns:

full_data – The original election data, the summed violation counts for both pre and post contact, and the scraped text features using the post-contact aXe URLs for each campaign website combined into a single dataframe.

Return type:

pd.DataFrame

Notes

For both the *_axe_scraping_results parameters, provide the parent directory of all individual campaign axe scraping result directories.

I.e. if the data is stored like so: |- pre-data/

|- site-a/ |- site-b/

|- post-data/

|- site-a/ |- site-b/

Provide the parameters as “pre-data/” and “post-data/” respectively.

Additionally, if the provided campaign website url is missing from either the pre or post axe results directories, the site is skipped / dropped from the expanded dataset.

Finally, any https:// or http:// is dropped from the campaign url. I.e. in the spreadsheet the value is https://website.org but the associated directory should be: pre-data/website.org

access_eval.analysis.core.flatten_access_eval_2021_dataset(data: DataFrame | None = None) DataFrame[source]

Flatten the access eval 2021 dataset by adding a new column called “Trial” which stores a categorical value for “Pre” or “Post” which allows us to simplify the columns into just “avg_errors_per_page” for example instead of having both “avg_errors_per_page_pre” and “avg_errors_per_page_post”.

Parameters:

data (pd.DataFrame) – Preloaded access eval data. Default: None (load access eval 2021 data)

Returns:

flattened – The flattened dataset.

Return type:

pd.DataFrame

Notes

This only provides a subset of the full dataset back. Notably dropping the “diff” computed fields.

access_eval.analysis.core.get_crucial_stats(data: DataFrame | None = None) Dict[str, Any][source]

Generate statistics we found useful in the 2021 paper.

This includes: * mayoral vs council campaigns by content features. * percent of total errors per each error severity level * majority of ease of reading range * ordered most common error types * winning vs losing campaigns by content features * winning vs losing campaigns by average errors by page

access_eval.analysis.core.load_access_eval_2021_dataset(path: str | Path | None = None) DataFrame[source]

Load the default access eval 2021 dataset or a provided custom dataset and add all computed fields.

Parameters:

path (Optional[Union[str, Path]]) – An optional path for custom data to load. Default: None (load official 2021 access eval dataset)

Returns:

data – The loaded dataframe object with all extra computed fields added.

Return type:

pd.DataFrame

access_eval.analysis.core.process_axe_evaluations_and_extras(axe_results_dir: str | Path, generate_extras: bool = False) CompiledMetrics[source]

Process all aXe evaluations and generate extra features (words, ease of reading, etc.) for the provided aXe result tree. Extras are optional to generate.

Parameters:
  • axe_results_dir (Union[str, Path]) – The directory for a specific website that has been processed using the access eval scraper.

  • generate_extras (bool) – Should the extra features be generated? Default: False (do not generate extra features)

Returns:

metrics – The counts of all violation levels summed for the whole axe results tree (and optional extra features).

Return type:

CompiledMetrics

access_eval.analysis.parse_axe_results module

class access_eval.analysis.parse_axe_results.AggregateAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_pages_affected: int, number_of_elements_in_violation: int, help_url: str)[source]

Bases: object

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
help_url: str
id: str
impact: str
impact_score: int
number_of_elements_in_violation: int
number_of_pages_affected: int
reason: str
classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
class access_eval.analysis.parse_axe_results.AxeImpact[source]

Bases: object

critical: str = 'critical'
minor: str = 'minor'
moderate: str = 'moderate'
serious: str = 'serious'
class access_eval.analysis.parse_axe_results.SimplifiedAxeViolation(id: str, impact: str, impact_score: int, reason: str, number_of_elements_in_violation: int, help_url: str)[source]

Bases: object

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
help_url: str
id: str
impact: str
impact_score: int
number_of_elements_in_violation: int
reason: str
classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
access_eval.analysis.parse_axe_results.generate_high_level_statistics(head_dir: str | Path) None[source]

Recursive glob of all directories for axe results and generate high level statistics both for single page and whole website.

Parameters:

head_dir (Union[str, Path]) – The directory to start the recursive glob for axe results in.

access_eval.analysis.plotting module

access_eval.analysis.plotting.plot_candidate_position_based_summary_stats(data: DataFrame | None = None) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_categorical_against_errors_boxplots(data: DataFrame | None = None) List[Path][source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_computed_fields_over_vote_share(data: DataFrame | None = None, save_path: str | Path | None = None) Path[source]
access_eval.analysis.plotting.plot_election_result_based_summary_stats(data: DataFrame | None = None) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_electoral_position_based_summary_stats(data: DataFrame | None = None) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_error_types_boxplots(data: DataFrame | None = None) Path[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_location_based_summary_stats(data: DataFrame | None = None) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_locations_against_errors_boxplots(data: DataFrame | None = None) Path[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_pre_post_errors(data: DataFrame | None = None) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.plotting.plot_pre_post_fields_compare(data: DataFrame | None = None, save_path: str | Path | None = None) Path[source]
access_eval.analysis.plotting.plot_summary_stats(data: DataFrame | None = None, subset_name: str = '', keep_cols: List[str] = [], plot_kwargs: Dict[str, Any] = {}) None[source]

Input data should be the “flattened” dataset.

access_eval.analysis.utils module

access_eval.analysis.utils.unpack_data(zipfile: str | Path = PosixPath('/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/access_eval/analysis/data/pre-access-eval-results.zip'), dest: str | Path = PosixPath('unpacked-pre-access-eval-results'), clean: bool = False) Path[source]

Unzips the zipfile to the destination location.

Parameters:
  • zipfile (Union[str, Path]) – The zipfile to unpack. Default: The 2021 campaign accessibility evaluation pre-contact data.

  • dest (Union[str, Path]) – The destination to unpack to. Default: The default location for unpacked “pre-contact” data.

  • clean (bool) – If a directory already exists at the destination location, should the directory be removed entirely before unpacking again. Default: False (raise an error if a directory already exists)

Module contents

Analysis package for access-eval.