Primitive API Reference

AbsoluteDiff

Calculates the absolute difference from the previous element in a list of numbers.

Description

The absolute difference from the previous element is computed for all elements in the input. The first item in the output will always be nan, since there is no previous element for the first element. Elements in the input containing nan will be filled using either a forward-fill or backward-fill method, specified by the method argument.

Examples

>>> absolute_diff = AbsoluteDiff()
>>> absolute_diff([2, 5, 15, 3]).tolist()
[nan, 3.0, 10.0, 12.0]

Forward filling of input elements using the 'ffill' argument

>>> absolute_diff_ffill = AbsoluteDiff(method="ffill")
>>> absolute_diff_ffill([None, 5, 10, 20, None, 10, None]).tolist()
[nan, nan, 5.0, 10.0, 0.0, 10.0, 0.0]

Backward filling of input element using the 'bfill' argument

>>> absolute_diff_bfill = AbsoluteDiff(method="bfill")
>>> absolute_diff_bfill([None, 5, 10, 20, None, 10, None]).tolist()
[nan, 0.0, 5.0, 10.0, 10.0, 0.0, nan]

The number of nan values that are filled can be limited

>>> absolute_diff_limitfill = AbsoluteDiff(limit=2)
>>> absolute_diff_limitfill([2, None, None, None, 3, 1]).tolist()
[nan, 0.0, 0.0, nan, nan, 2.0]

Type

  • Transform

Arguments

  • method (str)

  • limit (int)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

Age

Calculates the age in years as a floating point number given a date of birth.

Description

Age in years is computed by calculating the number of days between the date of birth and the reference time and dividing the result by 365.

Examples

Determine the age of three people as of Jan 1, 2019

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age = Age()
>>> input_ages = [pd.to_datetime("01-01-2000"),
... pd.to_datetime("05-30-1983"),
... pd.to_datetime("10-17-1997")]
>>> age(input_ages, time=reference_date).tolist()
[19.013698630136986, 35.61643835616438, 21.221917808219178]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

AgeOver18

Determines whether a person is over 18 years old given their date of birth.

Description

Returns True if the person's age is greater than or equal to 18 years. Returns False if the age is less than 18 years of age. Returns nan if the age is not defined or doesn't exist.

Examples

Determine whether someone born on Jan 1, 2000 is over 18 years old as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_over_18 = AgeOver18()
>>> input_ages = [pd.to_datetime("01-01-2000"), pd.to_datetime("06-01-2010")]
>>> age_over_18(input_ages, time=reference_date).tolist()
[True, False]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

AgeOver25

Determines whether a person is over 25 years old given their date of birth.

Description

Returns True if the person's age is greater than or equal to 25 years. Returns False if the age is less than 25 years of age. Returns nan if the age is not defined or doesn't exist.

Examples

Determine whether someone born on Jan 1, 2000 is over 25 years old as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_over_25 = AgeOver25()
>>> input_ages = [pd.to_datetime("01-01-2000"), pd.to_datetime("06-01-1990")]
>>> age_over_25(input_ages, time=reference_date).tolist()
[False, True]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

AgeOver65

Determines whether a person is over 65 years old given their date of birth.

Description

Returns True if the person's age is greater than or equal to 65 years. Returns False if the age is less than 65 years of age. Returns nan if the age is not defined or doesn't exist.

Examples

Determine whether someone born on Jan 1, 1950 is over 65 years old as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_over_65 = AgeOver65()
>>> input_ages = [pd.to_datetime("01-01-1950"), pd.to_datetime("01-01-2000")]
>>> age_over_65(input_ages, time=reference_date).tolist()
[True, False]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

AgeUnder18

Determines whether a person is under 18 years old given their date of birth.

Description

Returns True if the person's age is less than 18 years. Returns False if the age is more than or equal to 18 years. Returns np.nan if the age is not defined, or doesn't exist.

Examples

Determine whether someone born on Jan 1, 2000 is under 18 years old as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_under_18 = AgeUnder18()
>>> input_ages = [pd.to_datetime("01-01-2000"), pd.to_datetime("06-01-2010")]
>>> age_under_18(input_ages, time=reference_date).tolist()
[False, True]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

AgeUnder25

Determines whether a person is under 25 years old given their date of birth.

Description

Returns True if the person's age is less than 25 years. Returns False if the age is more than or equal to 25 years. Returns nan if the age is not defined, or doesn't exist.

Examples

Determine whether two people are under age 25 as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_under_25 = AgeUnder25()
>>> input_ages = [pd.to_datetime("01-01-1990"),
... pd.to_datetime("06-01-2010")]
>>> age_under_25(input_ages, time=reference_date).tolist()
[False, True]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

AgeUnder65

Determines whether a person is under 25 years old given their date of birth.

Description

Returns True if the person's age is less than 65 years. Returns False if the age is more than or equal to 65 years. Returns nan if the age is not defined, or doesn't exist.

Examples

Determine whether two people are under age 65 as of January 1, 2019.

>>> import pandas as pd
>>> reference_date = pd.to_datetime("01-01-2019")
>>> age_under_65 = AgeUnder65()
>>> input_ages = [pd.to_datetime("01-01-1950"),
... pd.to_datetime("06-01-2010")]
>>> age_under_65(input_ages, time=reference_date).tolist()
[False, True]

Type

  • Transform

Properties

  • Input Types:

    • DateOfBirth

  • Return Type:

    • Boolean

Requirements

featuretools>=0.5.1

Autocorrelation

Determines the Pearson correlation between a series and a shifted version of the series.

Examples

>>> autocorrelation = Autocorrelation()
>>> round(autocorrelation([1, 2, 3, 1, 3, 2]), 3)
-0.598

The number of periods to shift the input before performing correlation can be specified.

>>> autocorrelation_lag = Autocorrelation(lag=3)
>>> autocorrelation_lag([1, 2, 3, 1, 2, 3])
1.0

Type

  • Aggregation

Arguments

  • lag (int)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

AverageCountPerUnique

Determines the average count across all unique value.

Examples

Determine the average count values for all unique items in the input

>>> input = [1, 1, 2, 2, 3, 4, 5, 6, 7, 8]
>>> avg_count_per_unique = AverageCountPerUnique()
>>> avg_count_per_unique(input)
1.25

Determine the average count values for all unique items in the input with nan values ignored

>>> input = [1, 1, 2, 2, 3, 4, 5, None, 6, 7, 8]
>>> avg_count_per_unique = AverageCountPerUnique()
>>> avg_count_per_unique(input)
1.25

Determine the average count values for all unique items in the input with nan values included

>>> input = [1, 2, 2, 3, 4, 5, None, 6, 7, 8, 9]
>>> avg_count_per_unique_skipna_false = AverageCountPerUnique(skipna=False)
>>> avg_count_per_unique_skipna_false(input)
1.1

Type

  • Aggregation

Arguments

  • skipna (bool)

Properties

  • Input Types:

    • Discrete

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CityblockDistance

Calculates the distance between points in a city road grid.

Description

This distance is calculated using the haversine formula, which takes into account the curvature of the Earth. If either input data contains NaNs, the calculated distance with be NaN. This calculation is also known as the Mahnattan distance.

Examples

>>> cityblock_distance = CityblockDistance()
>>> DC = (38, -77)
>>> Boston = (43, -71)
>>> NYC = (40, -74)
>>> cityblock_distance([DC, DC], [NYC, Boston]).tolist()
[301.51883606126523, 672.0886239902864]

We can also change the units in which the distance is calculated.

>>> cityblock_distance_kilometers = CityblockDistance(unit='kilometers')
>>> cityblock_distance_kilometers([DC, DC], [NYC, Boston]).tolist()
[485.24753384652865, 1081.621803724818]

Type

  • Transform

Arguments

  • unit (str)

Properties

  • Commutative:

    • True

  • Input Types:

    • LatLong

    • LatLong

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

Correlation

Computes the correlation between two columns of values.

Examples

>>> correlation = Correlation()
>>> array_1 = [1, 4, 6, 7]
>>> array_2 = [1, 5, 9, 7]
>>> correlation(array_1, array_2)
0.9221388919541468

We can also use different methods of computation.

>>> correlation_pearson = Correlation(method='pearson')
>>> correlation_pearson(array_1, array_2)
0.9221388919541468
>>> correlation_spearman = Correlation(method='spearman')
>>> correlation_spearman(array_1, array_2)
0.7999999999999999
>>> correlation_kendall = Correlation(method='kendall')
>>> correlation_kendall(array_1, array_2)
0.6666666666666669

Type

  • Aggregation

Arguments

  • method (str)

Properties

  • Input Types:

    • Numeric

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountAboveMean

Calculates the number of values that are above the mean.

Examples

>>> count_above_mean = CountAboveMean()
>>> count_above_mean([1, 2, 3, 4, 5])
2

The way NaNs are treated can be controlled.

>>> count_above_mean_skipna = CountAboveMean(skipna=False)
>>> count_above_mean_skipna([1, 2, 3, 4, 5, None])
nan

Type

  • Aggregation

Arguments

  • skipna (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountBelowMean

Determines the number of values that are below the mean.

Examples

>>> count_below_mean = CountBelowMean()
>>> count_below_mean([1, 2, 3, 4, 10])
3

The way NaNs are treated can be controlled.

>>> count_below_mean_skipna = CountBelowMean(skipna=False)
>>> count_below_mean_skipna([1, 2, 3, 4, 5, None])
nan

Type

  • Aggregation

Arguments

  • skipna (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountGreaterThan

Determines the number of values greater than a controllable threshold.

Examples

>>> count_greater_than = CountGreaterThan(threshold=3)
>>> count_greater_than([1, 2, 3, 4, 5])
2

Type

  • Aggregation

Arguments

  • threshold (float)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountInsideNthSTD

Determines the count of observations that lie inside the first N standard deviations (inclusive).

Examples

>>> count_inside_nth_std = CountInsideNthSTD(n=1.5)
>>> count_inside_nth_std([1, 10, 15, 20, 100])
4

Type

  • Aggregation

Arguments

  • n (float)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountInsideRange

Determines the number of values that fall within a certain range.

Examples

>>> count_inside_range = CountInsideRange(lower=1.5, upper=3.6)
>>> count_inside_range([1, 2, 3, 4, 5])
2

The way NaNs are treated can be controlled.

>>> count_inside_range_skipna = CountInsideRange(skipna=False)
>>> count_inside_range_skipna([1, 2, 3, 4, 5, None])
nan

Type

  • Aggregation

Arguments

  • lower (float)

  • upper (float)

  • skipna (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountLessThan

Determines the number of values less than a controllable threshold.

Examples

>>> count_less_than = CountLessThan(threshold=3.5)
>>> count_less_than([1, 2, 3, 4, 5])
3

Type

  • Aggregation

Arguments

  • threshold (float)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountOutsideNthSTD

Determines the number of observations that lie outside the first N standard deviations.

Examples

>>> count_outside_nth_std = CountOutsideNthSTD(n=1.5)
>>> count_outside_nth_std([1, 10, 15, 20, 100])
1

Type

  • Aggregation

Arguments

  • n (float)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountOutsideRange

Determines the number of values that fall outside a certain range.

Examples

>>> count_outside_range = CountOutsideRange(lower=1.5, upper=3.6)
>>> count_outside_range([1, 2, 3, 4, 5])
3

The way NaNs are treated can be controlled.

>>> count_outside_range_skipna = CountOutsideRange(skipna=False)
>>> count_outside_range_skipna([1, 2, 3, 4, 5, None])
nan

Type

  • Aggregation

Arguments

  • lower (float)

  • upper (float)

  • skipna (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountString

Determines how many times a given string shows up in a text field.

Examples

>>> count_string = CountString(string="the")
>>> count_string(["The problem was difficult.",
... "He was there.",
... "The girl went to the store."]).tolist()
[1, 1, 2]

Match case of string

>>> count_string_ignore_case = CountString(string="the", ignore_case=False)
>>> count_string_ignore_case(["The problem was difficult.",
... "He was there.",
... "The girl went to the store."]).tolist()
[0, 1, 1]

Ignore non-alphanumeric characters in the search

>>> count_string_ignore_non_alphanumeric = CountString(string="the",
... ignore_non_alphanumeric=True)
>>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.",
... "He was there.",
... "The girl went to the store."]).tolist()
[1, 1, 2]

Specify the string as a regex

>>> count_string_is_regex = CountString(string="t.e", is_regex=True)
>>> count_string_is_regex(["The problem was difficult.",
... "He was there.",
... "The girl went to the store."]).tolist()
[1, 1, 2]

Match whole words only

>>> count_string_match_whole_words_only = CountString(string="the",
... match_whole_words_only=True)
>>> count_string_match_whole_words_only(["The problem was difficult.",
... "He was there.",
... "The girl went to the store."]).tolist()
[1, 0, 2]

Type

  • Transform

Arguments

  • string (str)

  • ignore_case (bool)

  • ignore_non_alphanumeric (bool)

  • is_regex (bool)

  • match_whole_words_only (bool)

Properties

  • Input Types:

    • Text

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CountryCodeToIncome

Transforms a 2-digit or 3-digit ISO-3166-1 country code into Gross National Income (GNI) per capita.

Description

The GNI per capita data was obtained from The World Bank (https://data.worldbank.org/indicator/NY.GNP.PCAP.CD). The GNI data uses 3-digit country codes. In order to use 2-digit codes, the GNI data was merged with a list of country codes obtained from Wikipedia (https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).

Examples

>>> country_code_to_income = CountryCodeToIncome()
>>> country_code_to_income(['USA', 'AM', 'EC']).tolist()
[58270.0, 3990.0, 5920.0]

Type

  • Transform

Properties

  • Input Types:

    • CountryCode

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CumulativeTimeSinceLastFalse

Determines the time since last False value.

Description

Given a list of booleans and a list of corresponding datetimes, determine the time at each point since the last False value. Returns time difference in seconds. NaN values are ignored.

Examples

>>> from datetime import datetime
>>> cumulative_time_since_last_false = CumulativeTimeSinceLastFalse()
>>> booleans = [False, True, False, True]
>>> datetimes = [
... datetime(2011, 4, 9, 10, 30, 0),
... datetime(2011, 4, 9, 10, 30, 10),
... datetime(2011, 4, 9, 10, 30, 15),
... datetime(2011, 4, 9, 10, 30, 29)
... ]
>>> cumulative_time_since_last_false(datetimes, booleans).tolist()
[0.0, 10.0, 0.0, 14.0]

Type

  • Transform

Properties

  • Input Types:

    • DatetimeTimeIndex

    • Boolean

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

CumulativeTimeSinceLastTrue

Determines the time (in seconds) since the last boolean was True given a datetime index column and boolean column

Examples

>>> from datetime import datetime
>>> cumulative_time_since_last_true = CumulativeTimeSinceLastTrue()
>>> booleans = [False, True, False, True]
>>> datetimes = [
... datetime(2011, 4, 9, 10, 30, 0),
... datetime(2011, 4, 9, 10, 30, 10),
... datetime(2011, 4, 9, 10, 30, 15),
... datetime(2011, 4, 9, 10, 30, 30)
... ]
>>> cumulative_time_since_last_true(datetimes, booleans).tolist()
[nan, 0.0, 5.0, 0.0]

Type

  • Transform

Properties

  • Input Types:

    • DatetimeTimeIndex

    • Boolean

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

DateFirstEvent

Determines the first datetime from a list of datetimes.

Examples

>>> from datetime import datetime
>>> date_first_event = DateFirstEvent()
>>> date_first_event([
... datetime(2011, 4, 9, 10, 30, 10),
... datetime(2011, 4, 9, 10, 30, 20),
... datetime(2011, 4, 9, 10, 30, 30)])
Timestamp('2011-04-09 10:30:10')

Type

  • Aggregation

Properties

  • Input Types:

    • DatetimeTimeIndex

  • Return Type:

    • Datetime

Requirements

featuretools>=0.5.1

DateToHoliday

Transforms time of an instance into the holiday name, if there is one.

Description

If there is no holiday, it returns NaN. Currently only works for the United States and Canada with dates between 1800 and 2199.

Examples

>>> from datetime import datetime
>>> date_to_holiday = DateToHoliday()
>>> dates = pd.Series([datetime(2016, 1, 1),
... datetime(2016, 2, 27),
... datetime(2017, 5, 29, 10, 30, 5),
... datetime(2018, 7, 4)])
>>> date_to_holiday(dates).tolist()
["New Year's Day", nan, 'Memorial Day', 'Independence Day']

We can also change the country.

>>> date_to_holiday_cananda = DateToHoliday(country='Canada')
>>> dates = pd.Series([datetime(2016, 7, 1),
... datetime(2016, 11, 15),
... datetime(2017, 12, 26),
... datetime(2018, 9, 3)])
>>> date_to_holiday_cananda(dates).tolist()
['Canada Day', nan, 'Boxing Day', 'Labour Day']

Type

  • Transform

Arguments

  • country (str)

Properties

  • Input Types:

    • Datetime

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1
holidays>=0.9.9

DateToTimeZone

Determines the timezone of a datetime.

Description

Given a list of datetimes, extract the timezone from each one. Looks for the tzinfo attribute on datetime.datetime objects. If the datetime has no timezone or the date is missing, return NaN.

Examples

>>> from datetime import datetime
>>> from pytz import timezone
>>> date_to_time_zone = DateToTimeZone()
>>> dates = [datetime(2010, 1, 1, tzinfo=timezone("America/Los_Angeles")),
... datetime(2010, 1, 1, tzinfo=timezone("America/New_York")),
... datetime(2010, 1, 1, tzinfo=timezone("America/Chicago")),
... datetime(2010, 1, 1)]
>>> date_to_time_zone(dates).tolist()
['America/Los_Angeles', 'America/New_York', 'America/Chicago', nan]

Type

  • Transform

Properties

  • Input Types:

    • Datetime

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1
pytz>=2018.9

Elmo

Transforms a sentence or short paragraph using deep contextualized langauge representations. Usese the following pre-trained model tfhub model‚Äč

Examples

>>> Elmo = Elmo()
>>> words = ["I like to eat pizza",
... "The roller coaster was built in 1885.",
... "When will humans go to mars?"]
>>> output = Elmo(words)
>>> len(output)
1024
>>> len(output[0])
3
>>> values = output[:3, 0]
>>> [round(x, 4) for x in values]
[-0.3457, -0.4546, 0.2538]

Type

  • Transform

Properties

  • Input Types:

    • Text

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1
tensorflow>=1.12.0
tensorflow_hub>=0.2.0

ExponentialWeightedAverage

Computes the exponentially weighted moving average for a series of numbers

Description

Returns the exponentially weighted moving average for a series of numbers. Exactly one of center of mass (com), span, half-life, and alpha must be provided. Missing values can be ignored when calculating weights by setting 'ignore_na' to True.

Examples

>>> exponential_weighted_average = ExponentialWeightedAverage(com=0.5)
>>> exponential_weighted_average([1, 2, 3, 4]).tolist()
[1.0, 1.75, 2.615384615384615, 3.55]

Missing values can be ignored

>>> ewma_ignorena = ExponentialWeightedAverage(com=0.5, ignore_na=True)
>>> ewma_ignorena([1, 2, 3, None, 4]).tolist()
[1.0, 1.75, 2.615384615384615, 2.615384615384615, 3.55]

Type

  • Transform

Arguments

  • com (float)

  • span (float)

  • halflife (float)

  • alpha (float)

  • ignore_na (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

ExponentialWeightedSTD

Computes the exponentially weighted moving standard deviation for a series of numbers

Description

Returns the exponentially weighted moving standard deviation for a series of numbers. Exactly one of center of mass (com), span, half-life, and alpha must be provided. Missing values can be ignored when calculating weights by setting 'ignore_na' to True.

Examples

>>> exponential_weighted_std = ExponentialWeightedSTD(com=0.5)
>>> exponential_weighted_std([1, 2, 3, 7]).tolist()
[nan, 0.7071067811865475, 0.9198662110077998, 2.9852200022005855]

Missing values can be ignored

>>> ewmstd_ignorena = ExponentialWeightedSTD(com=0.5, ignore_na=True)
>>> ewmstd_ignorena([1, 2, 3, None, 7]).tolist()
[nan, 0.7071067811865475, 0.9198662110077998, 0.9198662110077998, 2.9852200022005855]

Type

  • Transform

Arguments

  • com (float)

  • span (float)

  • halflife (float)

  • alpha (float)

  • ignore_na (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

ExponentialWeightedVariance

Computes the exponentially weighted moving variance for a series of numbers

Description

Returns the exponentially weighted moving variance for a series of numbers. Exactly one of center of mass (com), span, half-life, and alpha must be provided. Missing values can be ignored when calculating weights by setting 'ignore_na' to True.

Examples

>>> exponential_weighted_variance = ExponentialWeightedVariance(com=0.5)
>>> exponential_weighted_variance([1, 2, 3, 4]).tolist()
[nan, 0.49999999999999983, 0.8461538461538459, 1.1230769230769233]

Missing values can be ignored

>>> ewmv_ignorena = ExponentialWeightedVariance(com=0.5, ignore_na=True)
>>> ewmv_ignorena([1, 2, 3, None, 4]).tolist()
[nan, 0.49999999999999983, 0.8461538461538459, 0.8461538461538459, 1.1230769230769233]

Type

  • Transform

Arguments

  • com (float)

  • span (float)

  • halflife (float)

  • alpha (float)

  • ignore_na (bool)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

FileExtension

Determines the extension of a filepath.

Description

Given a list of filepaths, return the extension suffix of each one. If the filepath is missing or invalid, return NaN.

Examples

>>> file_extension = FileExtension()
>>> file_extension(['doc.txt', '~/documents/data.json', 'file']).tolist()
['.txt', '.json', nan]

Type

  • Transform

Properties

  • Input Types:

    • FilePath

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1

FirstLastTimeDelta

Determines the time between the first and last time value in seconds.

Examples

>>> from datetime import datetime
>>> first_last_time_delta = FirstLastTimeDelta()
>>> first_last_time_delta([
... datetime(2011, 4, 9, 10, 30, 0),
... datetime(2011, 4, 9, 10, 30, 15),
... datetime(2011, 4, 9, 10, 30, 35)])
35.0

Type

  • Aggregation

Properties

  • Input Types:

    • DatetimeTimeIndex

  • Return Type:

    • Numeric

Requirements

featuretools>=0.5.1

FullNameToFirstName

Determines the first name from a person's name.

Description

Given a list of names, determines the first name. If only a single name is provided, assume this is a first name. If only a title and a single name is provided return nan. This assumes all titles will be followed by a period. Please note, in the current implementation, last names containing spaces may result in improper first name matches.

Examples

>>> full_name_to_first_name = FullNameToFirstName()
>>> names = ['Woolf Spector', 'Oliva y Ocana, Dona. Fermina',
... 'Ware, Mr. Frederick', 'Peter, Michael J', 'Mr. Brown']
>>> full_name_to_first_name(names).to_list()
['Woolf', 'Oliva', 'Frederick', 'Michael', nan]

Type

  • Transform

Properties

  • Input Types:

    • FullName

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1

FullNameToLastName

Determines the first name from a person's name.

Description

Given a list of names, determines the last name. If only a single name is provided, assume this is a first name, and return nan. This assumes all titles will be followed by a period.

Examples

>>> full_name_to_last_name = FullNameToLastName()
>>> names = ['Woolf Spector', 'Oliva y Ocana, Dona. Fermina',
... 'Ware, Mr. Frederick', 'Peter, Michael J', 'Mr. Brown']
>>> full_name_to_last_name(names).to_list()
['Spector', 'Oliva y Ocana', 'Ware', 'Peter', 'Brown']

Type

  • Transform

Properties

  • Input Types:

    • FullName

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1

FullNameToTitle

Determines the title from a person's name.

Description

Given a list of names, determines the title, or prefix of each name (e.g. "Mr", "Mrs", etc). If no title is found, returns NaN.

Examples

>>> full_name_to_title = FullNameToTitle()
>>> names = ['Spector, Mr. Woolf', 'Oliva y Ocana, Dona. Fermina',
... 'Ware, Mr. Frederick', 'Peter, Michael J', 'Mr. Brown']
>>> full_name_to_title(names).to_list()
['Mr', 'Dona', 'Mr', nan, 'Mr']

Type

  • Transform

Properties

  • Input Types:

    • FullName

  • Return Type:

    • Categorical

Requirements

featuretools>=0.5.1

GeoMidpoint

Determines the geographic center of two coordinates.

Examples

>>> geomidpoint = GeoMidpoint()
>>> geomidpoint([(42.4, -71.1)], [(40.0, -122.4)])
[(41.2, -96.75)]

Type

  • Transform

Properties

  • Commutative:

    • True

  • Input Types:

    • LatLong

    • LatLong

  • Return Type:

    • LatLong

Requirements

featuretools>=0.5.1

GreaterThanPrevious

Determines if a value is greater than the previous value in a list.

Description

Compares a value in a list to the previous value and returns True if the value is greater than the previous value or False otherwise. The first item in the output will always be False, since there is no previous element for the first element comparison. Any nan values in the input will be filled using either a forward-fill or backward-fill method, specified by the fill_method argument. The number of consecutive nan values that get filled can be limited with the limit argument. Any nan values left after filling will result in False being returned for any comparison involving the nan value.

Examples

>>> greater_than_previous = GreaterThanPrevious()
>>> greater_than_previous([1, 2, 1, 4]).tolist()
[False, True, False, True]

The fill method for nan values can be specified

>>> greater_than_previous_fillna = GreaterThanPrevious(fill_method="bfill")
>>> greater_than_previous_fillna([1, None, 2, 4]).tolist()
[False, True, False, True]

The number of nan values that are filled can be limited

>>> greater_than_previous_limitfill = GreaterThanPrevious(limit=2)
>>> greater_than_previous_limitfill([1, None, None, None, 2, 3]).tolist()
[False, False, False, False, False, True]

Type

  • Transform

Arguments

  • fill_method (str)

  • limit (int)

Properties

  • Input Types:

    • Numeric

  • Return Type:

    • Numeric

Requirements