epic_kitchens
package¶
epic-kitchens
This library contains a variety of useful classes and functions for performing common operations on the EPIC Kitchens dataset.
epic_kitchens.dataset¶
epic_kitchens.dataset.epic_dataset module¶
-
class
epic_kitchens.dataset.epic_dataset.
EpicVideoDataset
(gulp_path, class_type, *, with_metadata=False, class_getter=None, segment_filter=None, sample_transform=None)[source]¶ Bases:
epic_kitchens.dataset.video_dataset.VideoDataset
VideoDataset for gulped RGB frames
-
__init__
(gulp_path, class_type, *, with_metadata=False, class_getter=None, segment_filter=None, sample_transform=None)[source]¶ - Parameters
gulp_path (
Union
[Path
,str
]) – Path to gulp directory containing the gulped EPIC RGB or flow framesclass_type (
str
) – One of verb, noun, verb+noun, None, determines what label the segment returns.None
should be used for loading test datasets.with_metadata (
bool
) – When True the segments will yield a tuple (metadata, class) where the class is defined by the class getter and the metadata is the raw dictionary stored in the gulp file.class_getter (
Optional
[Callable
[[Dict
[str
,Any
]],Any
]]) – Optionally provide a callable that takes in the gulp dict representing the segment from which you should return the class you wish the segment to have.segment_filter (
Optional
[Callable
[[VideoSegment
],bool
]]) – Optionally provide a callable that takes a segment and returns True if you want to keep the segment in the dataset, or False if you wish to exclude it.sample_transform (
Optional
[Callable
[[List
[Image
]],List
[Image
]]]) – Optionally provide a sample transform function which takes a list of PIL images and transforms each of them. This is applied on the frames just before returning fromload_frames()
.
- Return type
None
-
property
video_segments
¶ List of video segments that are present in the dataset. The describe the start and stop times of the clip and its class.
- Return type
-
-
class
epic_kitchens.dataset.epic_dataset.
EpicVideoFlowDataset
(gulp_path, class_type, *, with_metadata=False, class_getter=None, segment_filter=None, sample_transform=None)[source]¶ Bases:
epic_kitchens.dataset.epic_dataset.EpicVideoDataset
VideoDataset for loading gulped flow. The loader assumes that flow \(u\), \(v\) frames are stored alternately in a flat manner: \([u_0, v_0, u_1, v_1, \ldots, u_n, v_n]\)
-
class
epic_kitchens.dataset.epic_dataset.
GulpVideoSegment
(gulp_metadata_dict, class_getter)[source]¶ Bases:
epic_kitchens.dataset.video_dataset.VideoSegment
SegmentRecord for a video segment stored in a gulp file.
- Assumes that the video segment has the following metadata in the gulp file:
id
num_frames
epic_kitchens.dataset.video_dataset module¶
-
class
epic_kitchens.dataset.video_dataset.
VideoDataset
(class_count, segment_filter=None, sample_transform=None)[source]¶ Bases:
abc.ABC
A dataset interface for use with
TsnDataset
. Implement this interface if you wish to use your dataset with TSN.We cannot use
torch.utils.data.Dataset
because we need to yield information about the number of frames per video, which we can’t do with the standard torch.utils.data.Dataset.-
property
video_segments
¶ - Return type
-
property
epic_kitchens.gulp¶
Dataset Adapters for GulpIO.
This module contains two adapters for ‘gulping’ both RGB and flow frames
which can then be used with the EpicVideoDataset
classes.
epic_kitchens.gulp.adapter¶
-
class
epic_kitchens.gulp.adapter.
EpicDatasetAdapter
(video_segment_dir, annotations_df, frame_size=-1, extension='jpg', labelled=True)[source]¶ Bases:
gulpio.adapters.AbstractDatasetAdapter
Gulp Dataset Adapter for Gulping RGB frames extracted from the EPIC-KITCHENS dataset
-
__init__
(video_segment_dir, annotations_df, frame_size=-1, extension='jpg', labelled=True)[source]¶ Gulp all action segments in
annotations_df
reading the dumped frames fromvideo_segment_dir
- Parameters
video_segment_dir (
str
) –Root directory containing segmented frames:
frame-segments/ ├── P01 │ ├── P01_01 │ | ├── P01_01_0_open-door │ | | ├── frame_0000000008.jpg │ | | ... │ | | ├── frame_0000000202.jpg │ | ... │ | ├── P01_01_329_put-down-plate │ | | ├── frame_0000098424.jpg │ | | ... │ | | ├── frame_0000098501.jpg │ ...
annotations_df (
DataFrame
) – DataFrame containing labels to be gulped.frame_size (
int
) – Size of shortest edge of the frame, if not already this size then it will be resized.extension (
str
) – Extension of dumped frames.
- Return type
None
-
iter_data
(slice_element=None)[source]¶ Get frames and metadata corresponding to segment
- Parameters
slice_element (optional) – If not specified all frames for the segment will be returned
- Yields
dict – dictionary with the fields
meta
: All metadata corresponding to the segment, this is the same as the data in the labels csvframes
: list ofPIL.Image.Image
corresponding to the frames specified inslice_element
id
: UID corresponding to segment
- Return type
-
-
class
epic_kitchens.gulp.adapter.
EpicFlowDatasetAdapter
(video_segment_dir, annotations_df, frame_size=-1, extension='jpg', labelled=True)[source]¶ Bases:
epic_kitchens.gulp.adapter.EpicDatasetAdapter
Gulp Dataset Adapter for Gulping flow frames extracted from the EPIC-KITCHENS dataset
-
iter_data
(slice_element=None)[source]¶ Get frames and metadata corresponding to segment
- Parameters
slice_element (optional) – If not specified all frames for the segment will be returned
- Yields
dict – dictionary with the fields
meta
: All metadata corresponding to the segment, this is the same as the data in the labels csvframes
: list ofPIL.Image.Image
corresponding to the frames specified inslice_element
id
: UID corresponding to segment
-
epic_kitchens.gulp.visualisation¶
-
class
epic_kitchens.gulp.visualisation.
FlowVisualiser
(dataset)[source]¶ Bases:
epic_kitchens.gulp.visualisation.Visualiser
Visualiser for video dataset containing optical flow \((u, v)\) frames
-
class
epic_kitchens.gulp.visualisation.
RgbVisualiser
(dataset)[source]¶ Bases:
epic_kitchens.gulp.visualisation.Visualiser
Visualiser for video dataset containing RGB frames
-
epic_kitchens.gulp.visualisation.
clipify_flow
(frames, *, fps=30.0)[source]¶ Destack flow frames, join them side by side and then create a clip for display
epic_kitchens.meta¶
-
class
epic_kitchens.meta.
Action
(verb, noun)¶ Bases:
tuple
-
property
noun
¶ Alias for field number 1
-
property
verb
¶ Alias for field number 0
-
property
-
class
epic_kitchens.meta.
ActionClass
(verb_class, noun_class)¶ Bases:
tuple
-
property
noun_class
¶ Alias for field number 1
-
property
verb_class
¶ Alias for field number 0
-
property
-
epic_kitchens.meta.
action_id_from_verb_noun
(verb, noun)[source]¶ Map a verb and noun id to a dense action id.
Examples
>>> action_id_from_verb_noun(0, 0) 0 >>> action_id_from_verb_noun(0, 1) 1 >>> action_id_from_verb_noun(0, 351) 351 >>> action_id_from_verb_noun(1, 0) 352 >>> action_id_from_verb_noun(1, 1) 353 >>> action_id_from_verb_noun(np.array([0, 1, 2]), np.array([0, 1, 2])) array([ 0, 353, 706])
-
epic_kitchens.meta.
action_tuples_to_ids
(action_classes)[source]¶ Convert a list of action classes composed of a verb and noun class to a dense action id using the formula: \(c_v * 352 + c_n\)
- Parameters
action_classes (
Iterable
[ActionClass
]) –- Return type
- Returns
action_ids
-
epic_kitchens.meta.
class_to_noun
(cls)[source]¶ - Parameters
cls (
int
) – numeric noun class- Return type
- Returns
Canonical noun representing the class
- Raises
IndexError – if
cls
is an invalid noun class
-
epic_kitchens.meta.
class_to_verb
(cls)[source]¶ - Parameters
cls (
int
) – numeric verb class- Return type
- Returns
Canonical verb representing the class
- Raises
IndexError – if
cls
is an invalid verb class
-
epic_kitchens.meta.
get_datadir
()[source]¶ - Return type
- Returns
Directory under which any downloaded files are stored, defaults to current working directory
-
epic_kitchens.meta.
is_many_shot_action
(action_class)[source]¶ - Parameters
action_class (
ActionClass
) –(verb_class, noun_class)
tuple- Return type
- Returns
Whether action_class is many shot or not
-
epic_kitchens.meta.
many_shot_actions
()[source]¶ - Return type
- Returns
The set of actions classes that are many shot (verb_class appears more than 100 times in training, noun_class appears more than 100 times in training, and the action appears at least once in training).
-
epic_kitchens.meta.
noun_classes
()[source]¶ Get dataframe containing the mapping between numeric noun classes, the canonical noun of that class and nouns clustered into the class.
- Returns
Column Name
Type
Example
Description
noun_id
int
2
ID of the noun class.
class_key
string
pan:dust
Key of the noun class.
nouns
list of string (1 or more)
"['pan:dust', 'dustpan']"
All nouns within the class (includes the key).
- Return type
Dataframe with the columns
-
epic_kitchens.meta.
noun_id_from_action_id
(action)[source]¶ Decode action id to verb id.
Examples
>>> noun_id_from_action_id(0) 0 >>> noun_id_from_action_id(1) 1 >>> noun_id_from_action_id(351) 351 >>> noun_id_from_action_id(352) 0 >>> noun_id_from_action_id(353) 1 >>> noun_id_from_action_id(352 + 351) 351 >>> noun_id_from_action_id(np.array([0, 1, 353])) array([0, 1, 1])
-
epic_kitchens.meta.
noun_to_class
(noun)[source]¶ - Parameters
noun (
str
) – A noun from a narration- Return type
- Returns
The corresponding numeric class of the noun if it exists
- Raises
IndexError – If the noun doesn’t belong to any of the noun classes
-
epic_kitchens.meta.
test_timestamps
(split)[source]¶ - Parameters
split (
str
) – ‘seen’, ‘unseen’, or ‘all’ (loads both with a ‘split’- Return type
DataFrame
- Returns
Dataframe with the columns
Column Name
Type
Example
Description
uid
int
1924
Unique ID of the segment.
participant_id
string
P01
ID of the participant.
video_id
string
P01_11
Video the segment is in.
start_timestamp
string
00:00:00.000
Start time in
HH:mm:ss.SSS
of the action.stop_timestamp
string
00:00:01.890
End time in
HH:mm:ss.SSS
of the action.start_frame
int
1
Start frame of the action (WARNING only for frames extracted as detailed in annotations README).
stop_frame
int
93
End frame of the action (WARNING only for frames extracted as detailed in annotations README).
-
epic_kitchens.meta.
training_labels
()[source]¶ - Return type
DataFrame
- Returns
Dataframe with the columns
Column Name
Type
Example
Description
uid
int
6374
Unique ID of the segment.
video_id
string
P03_01
Video the segment is in.
narration
string
close fridge
English description of the action provided by the participant.
start_timestamp
string
00:23:43.847
Start time in
HH:mm:ss.SSS
of the action.stop_timestamp
string
00:23:47.212
End time in
HH:mm:ss.SSS
of the action.start_frame
int
85430
Start frame of the action (WARNING only for frames extracted as detailed in annotations README)
stop_frame
int
85643
End frame of the action (WARNING only for frames extracted as detailed in annotations README)
participant_id
string
P03
ID of the participant.
verb
string
close
Parsed verb from the narration.
noun
string
fridge
First parsed noun from the narration.
verb_class
int
3
Numeric ID of the parsed verb’s class.
noun_class
int
10
Numeric ID of the parsed noun’s class.
all_nouns
list of string (1 or more)
['fridge']
List of all parsed nouns from the narration.
all_nouns_class
list of int (1 or more)
[10]
List of numeric IDs corresponding to all of the parsed nouns’ classes from the narration.
-
epic_kitchens.meta.
training_narrations
()[source]¶ - Return type
DataFrame
- Returns
Dataframe with the columns
Column Name
Type
Example
Description
participant_id
string
P03
ID of the participant.
video_id
string
P03_01
Video the segment is in.
start_timestamp
string
00:23:43.847
Start time in
HH:mm:ss.SSS
of the narration.stop_timestamp
string
00:23:47.212
End time in
HH:mm:ss.SSS
of the narration.narration
string
close fridge
English description of the action provided by the participant.
-
epic_kitchens.meta.
training_object_labels
()[source]¶ - Return type
DataFrame
- Returns
Dataframe with the columns
Column Name
Type
Example
Description
noun_class
int
20
Integer value representing the class in noun-classes.csv.
noun
string
bag
Original string name for the object.
participant_id
string
P01
ID of participant.
video_id
string
P01_01
Video the object was annotated in.
frame
int
056581
Frame number of the annotated object.
bounding_boxes
list of 4-tuple (0 or more)
"[(76, 1260, 462, 186)]"
Annotated boxes with format
(<top:int>,<left:int>,<height:int>,<width:int>)
.
-
epic_kitchens.meta.
verb_classes
()[source]¶ Get dataframe containing the mapping between numeric verb classes, the canonical verb of that class and verbs clustered into the class.
- Return type
DataFrame
- Returns
Dataframe with the columns
Column Name
Type
Example
Description
verb_id
int
3
ID of the verb class.
class_key
string
close
Key of the verb class.
verbs
list of string (1 or more)
"['close', 'close-off', 'shut']"
All verbs within the class (includes the key).
-
epic_kitchens.meta.
verb_id_from_action_id
(action_id)[source]¶ Decode action id to noun id. :type action_id:
Union
[int
,ndarray
] :param action_id: Either a single action id, or anp.ndarray
of action ids.Examples
>>> verb_id_from_action_id(0) 0 >>> verb_id_from_action_id(1) 0 >>> verb_id_from_action_id(352) 1 >>> verb_id_from_action_id(353) 1 >>> verb_id_from_action_id(np.array([0, 352, 1, 353])) array([0, 1, 0, 1])
-
epic_kitchens.meta.
verb_to_class
(verb)[source]¶ - Parameters
verb (
str
) – A noun from a narration- Return type
- Returns
The corresponding numeric class of the verb if it exists
- Raises
IndexError – If the verb doesn’t belong to any of the verb classes
-
epic_kitchens.meta.
video_descriptions
()[source]¶ - Return type
DataFrame
- Returns
High level description of the task trying to be accomplished in a video.
Column Name
Type
Example
Description
video_id
string
P01_01
ID of the video.
date
string
30/04/2017
Date on which the video was shot.
time
string
13:49:00
Local recording time of the video.
description
string
prepared breakfast with soy milk and cereals
Description of the activities contained in the video.
-
epic_kitchens.meta.
video_info
()[source]¶ - Return type
DataFrame
- Returns
Technical information stating the resolution, duration and FPS of each video.
Column Name
Type
Example
Description
video
string
P01_01
Video ID
resolution
string
1920x1080
Resolution of the video, format is
WIDTHxHEIGHT
duration
float
1652.152817
Duration of the video, in seconds
fps
float
59.9400599400599
Frame rate of the video
epic_kitchens.metrics¶
-
epic_kitchens.metrics.
compute_class_agnostic_metrics
(groundtruth_df, ranks, many_shot_verbs=None, many_shot_nouns=None, many_shot_actions=None)[source]¶ Compute class agnostic metrics (many-shot precision and recall) from ranks.
- Parameters
groundtruth_df (
DataFrame
) – DataFrame containing'verb_class'
:int
,'noun_class'
:int
and'action_class'
:int
columns.ranks (
Dict
[str
,ndarray
]) – Dictionary containing three entries:'verb'
,'noun'
and'action'
. Entries should map to a 2Dnp.ndarray
of shape(n_instances, n_classes)
where the index is the predicted rank of the class at that index.many_shot_verbs (
Optional
[ndarray
]) – The set of verb classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_verbs()
many_shot_nouns (
Optional
[ndarray
]) – The set of noun classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_nouns()
many_shot_actions (
Optional
[ndarray
]) – The set of action classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_actions()
- Return type
- Returns
Dictionary with the structure:
precision: verb: float noun: float action: float verb_per_class: dict[str:float, length = n_verbs] recall: verb: float noun: float action: float verb_per_class: dict[str:float, length = n_verbs]
The
'verb'
,'noun'
, and'action'
entries of the metric dictionaries are the macro-averaged mean precision/recall over the set of many shot classes, whereas the ‘verb_per_class’ entry is a breakdown for each verb_class in the format of a dictionary mapping stringified verb class to that class’ precision/recall.
-
epic_kitchens.metrics.
compute_class_aware_metrics
(groundtruth_df, ranks, top_k=(1, 5))[source]¶ Compute class aware metrics (accuracy @ 1/5) from ranks.
- Parameters
groundtruth_df (
DataFrame
) – DataFrame containing'verb_class'
:int
,'noun_class'
:int
and'action_class'
:int
columns.ranks (
Dict
[str
,ndarray
]) – Dictionary containing three entries:'verb'
,'noun'
and'action'
. Entries should map to a 2Dnp.ndarray
of shape(n_instances, n_classes)
where the index is the predicted rank of the class at that index.top_k (
Union
[int
,Tuple
[int
, …]]) – The set of k values to compute top-k accuracy for.
- Return type
- Returns
Dictionary with the structure:
verb: list[float, length = len(top_k)] noun: list[float, length = len(top_k)] action: list[float, length = len(top_k)]
-
epic_kitchens.metrics.
compute_metrics
(groundtruth_df, scores, many_shot_verbs=None, many_shot_nouns=None, many_shot_actions=None, action_priors=None)[source]¶ Compute the EPIC action recognition evaluation metrics from
scores
given ground truth labels ingroundtruth_df
.- Parameters
groundtruth_df (
DataFrame
) – DataFrame containingverb_class
:int
,noun_class
:int
. This function will add anaction_class
column containing the action ID obtained fromepic_kitchens.meta.action_id_from_verb_noun()
.scores (
Dict
[str
,Union
[ndarray
,Dict
[int
,float
]]]) – Dictionary containing:'verb'
,'noun'
and (optionally)'action'
entries.'verb'
and'noun'
should map to a 2Dnp.ndarray
of shape(n_instances, n_classes)
where each element is the predicted score of that class.'action'
should map to a dictionary of action keys to scores. The order of the scores array should be the same as the order ingroundtruth_df
.many_shot_verbs (
Optional
[ndarray
]) – The set of verb classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_verbs()
many_shot_nouns (
Optional
[ndarray
]) – The set of noun classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_nouns()
many_shot_actions (
Optional
[ndarray
]) – The set of action classes that are considered many shot. If not provided they are loaded fromepic_kitchens.meta.many_shot_actions()
action_priors (
Optional
[ndarray
]) – A(n_verbs, n_nouns)
shaped array containing the action prior used to weight action predictions.
- Return type
- Returns
A dictionary containing all metrics with the following structure:
accuracy: verb: list[float, length 2] noun: list[float, length 2] action: list[float, length 2] precision: verb: float noun: float action: float recall: verb: float noun: float action: float
Accuracy lists contain the top-k metrics like so
[top_1, top_5]
, the precision and recall metrics are macro averaged and computed over the many-shot classes.- Raises
ValueError – If the shapes of the
scores
arrays are not correct, or the lengths ofgroundtruth_df
and thescores
arrays are not equal, or ifgrountruth_df
doesn’t have the specified columns.
-
epic_kitchens.metrics.
precision_recall
(rankings, labels, classes=None)[source]¶ Computes precision and recall from rankings.
- Parameters
- Return type
- Returns
Tuple of
(precision, recall)
whereprecision
is a 1D array of shape(len(classes),)
, andrecall
is a 1D array of shape(len(classes),)
- Raises
ValueError – If the dimensionality of the
rankings
orlabels
is incorrect, or if the length of therankings
andlabels
are not equal, or if the set of the providedclasses
is not a subset of the classes present inlabels
.
-
epic_kitchens.metrics.
topk_accuracy
(rankings, labels, ks=(1, 5))[source]¶ Computes top-k accuracies for different values of k from rankings.
- Parameters
- Return type
- Returns
Top-k accuracy for each
k
inks
. If only onek
is provided, then only a single float is returned.- Raises
ValueError – If the dimensionality of the
rankings
orlabels
is incorrect, or if the length ofrankings
andlabels
aren’t equal.
epic_kitchens.scoring¶
-
epic_kitchens.scoring.
compute_action_scores
(verb_scores, noun_scores, top_k=100, action_priors=None)[source]¶ Given the predicted verb and noun scores, compute action scores by \(p(A = (v, n)) = p(V = v)p(N = n)\).
- Parameters
verb_scores (
ndarray
) – 2D array of verb scores(n_instances, n_verbs)
.noun_scores (
ndarray
) – 2D array of noun scores(n_instances, n_nouns)
.top_k (
int
) – Number of highest scored actions to compute.action_priors (
Optional
[ndarray
]) – 2D array of action priors(n_verbs, n_nouns)
. These don’t have to sum to one and as such you can provide the training counts of \((v, n)\) occurrences (to minimize numerical stability issues).
- Return type
- Returns
A tuple
((verbs, noun), action_scores)
whereverbs
andnouns
are 2D arrays of shape(n_instances, top_k)
containing the classes constituting the top-k action scores.action_scores
is a 2D array of shape(n_instances, top_k)
whereaction_scores[i, j]
corresponds to the score for the action class(verbs[i, j], nouns[i, j])
. The scores are sorted in descending order, i.e.action_scores[i, j] >= action_scores[i, j + 1]
.
-
epic_kitchens.scoring.
scores_dict_to_ranks
(scores_dict)[source]¶ Convert a dictionary of task to scores to a dictionary of task to ranks
-
epic_kitchens.scoring.
scores_to_ranks
(scores)[source]¶ Convert scores to ranks
- Parameters
scores (
Union
[ndarray
,List
[Dict
[int
,float
]]]) – A 2D array of scores of shape(n_instances, n_classes)
or a list of dictionaries, where each dictionary represents the sparse scores for a task. The key: value pairs of the dictionary represent the class: score mapping.- Return type
- Returns
A 2D array of ranks
(n_instances, n_classes)
. Each row contains the ranked classes in descending order, i.e.ranks[0, i]
is ranked higher thanranks[0, i+1]
. The index is the rank, and the element the class at that rank.
-
epic_kitchens.scoring.
softmax
(x)[source]¶ Compute the softmax of the 1D or 2D array
x
.- Parameters
x (
ndarray
) – a 1D or 2D array. If 1D, then it is assumed that it is a single class score vector. Otherwise, ifx
is 2D, then each row is assumed to be a class score vector.
Examples
>>> res = softmax(np.array([0, 200, 10])) >>> np.sum(res) 1.0 >>> np.all(np.abs(res - np.array([0, 1, 0])) < 0.0001) True >>> res = softmax(np.array([[0, 200, 10], [0, 10, 200], [200, 0, 10]])) >>> np.argsort(res, axis=1) array([[0, 2, 1], [0, 1, 2], [1, 2, 0]]) >>> np.sum(res, axis=1) array([1., 1., 1.]) >>> res = softmax(np.array([[0, 200, 10], [0, 10, 200]])) >>> np.sum(res, axis=1) array([1., 1.])
- Return type
-
epic_kitchens.scoring.
top_scores
(scores, top_k=100)[source]¶ Return the
top_k
class indices and scores in descending order.- Parameters
- Return type
- Returns
A tuple containing two arrays,
(ranked_classes, scores)
where ranked_classes contains the classes in descending order of score, andscores
contains the corresponding score for each class, i.e.ranked_classes[..., i]
has scorescores[..., i]
.
Examples
>>> top_scores(np.array([0.2, 0.6, 0.1, 0.04, 0.06]), top_k=3) (array([1, 0, 2]), array([0.6, 0.2, 0.1]))
epic_kitchens.preprocessing¶
Pre-processing tools to munge data into a format suitable for training
epic_kitchens.preprocessing.split_segments¶
Program for splitting frames into action segments See Action segmentation for usage details
epic_kitchens.labels¶
Column names present in a labels dataframe.
Rather than accessing column names directly, we suggest you import these constants and use them to access the data in case the names change at any point.
-
epic_kitchens.labels.
NARRATION_COL
= 'narration'¶ Start timestamp column name, the timestamp of the start of the action segment
e.g.
"00:23:43.847"
-
epic_kitchens.labels.
NOUNS_CLASS_COL
= 'all_noun_classes'¶ The noun class corresponding to an action without a noun, consider the narration “stir” where no object is specified.
-
epic_kitchens.labels.
NOUNS_COL
= 'all_nouns'¶ Nouns class column name, the classes corresponding to each noun extracted from the narration
e.g.
[10]
-
epic_kitchens.labels.
NOUN_CLASS_COL
= 'noun_class'¶ Nouns column name, all nouns extracted from the narration
e.g.
["fridge"]
-
epic_kitchens.labels.
NOUN_COL
= 'noun'¶ Noun class column name, the class corresponding to the first noun extracted from the narration
e.g.
10
-
epic_kitchens.labels.
PARTICIPANT_ID_COL
= 'participant_id'¶ Verb column name, the first verb extracted from the narration
e.g.
"close"
-
epic_kitchens.labels.
START_F_COL
= 'start_frame'¶ Stop frame column name, the frame corresponding to the starting timestamp
e.g.
85643
-
epic_kitchens.labels.
START_TS_COL
= 'start_timestamp'¶ Stop timestamp column name, the timestamp of the end of the action segment
e.g.
"00:23:47.212"
-
epic_kitchens.labels.
STOP_F_COL
= 'stop_frame'¶ Participant ID column name, the identifier corresponding to an individual
e.g.
85643
-
epic_kitchens.labels.
STOP_TS_COL
= 'stop_timestamp'¶ Start frame column name, the frame corresponding to the starting timestamp
e.g.
85430
-
epic_kitchens.labels.
UID_COL
= 'uid'¶ Video column name, an identifier for a specific video of the form Pdd_dd, the first two digits are the participant ID, and the last two digits the video ID
e.g.
"P03_01"
-
epic_kitchens.labels.
VERB_CLASS_COL
= 'verb_class'¶ Noun column name, the first noun extracted from the narration
e.g.
"fridge"
-
epic_kitchens.labels.
VERB_COL
= 'verb'¶ Verb class column name, the class corresponding to the verb extracted from the narration.
e.g.
3
-
epic_kitchens.labels.
VIDEO_ID_COL
= 'video_id'¶ Narration column name, the original narration by the participant about the action performed
e.g.
"close fridge"
epic_kitchens.time¶
Functions for converting between frames and timestamps
-
epic_kitchens.time.
flow_frame_count
(rgb_frame, stride, dilation)[source]¶ Get the number of frames in a optical flow segment given the number of frames in the corresponding rgb segment from which the flow was extracted with parameters
(stride, dilation)
- Parameters
- Return type
- Returns
The number of optical flow frames
Examples
>>> flow_frame_count(6, 1, 1) 5 >>> flow_frame_count(6, 2, 1) 3 >>> flow_frame_count(6, 1, 2) 4 >>> flow_frame_count(6, 2, 2) 2 >>> flow_frame_count(6, 3, 1) 2 >>> flow_frame_count(6, 1, 3) 3 >>> flow_frame_count(7, 1, 1) 6 >>> flow_frame_count(7, 2, 1) 3 >>> flow_frame_count(7, 1, 2) 5 >>> flow_frame_count(7, 2, 2) 3 >>> flow_frame_count(7, 3, 1) 2 >>> flow_frame_count(7, 1, 3) 4
-
epic_kitchens.time.
seconds_to_timestamp
(total_seconds)[source]¶ Convert seconds into a timestamp
- Parameters
total_seconds (
float
) – time in seconds- Return type
- Returns
timestamp representing
total_seconds
Examples
>>> seconds_to_timestamp(1) '00:00:1.000' >>> seconds_to_timestamp(1.1) '00:00:1.100' >>> seconds_to_timestamp(60) '00:01:0.000' >>> seconds_to_timestamp(61) '00:01:1.000' >>> seconds_to_timestamp(60 * 60 + 1) '01:00:1.000' >>> seconds_to_timestamp(60 * 60 + 60 + 1) '01:01:1.000' >>> seconds_to_timestamp(1225.78500002) '00:20:25.785'
-
epic_kitchens.time.
timestamp_to_frame
(timestamp, fps)[source]¶ Convert timestamp to frame number given the FPS of the extracted frames
- Parameters
- Return type
- Returns
frame corresponding timestamp
Examples
>>> timestamp_to_frame("00:00:00", 29.97) 1 >>> timestamp_to_frame("00:00:01", 29.97) 29 >>> timestamp_to_frame("00:00:01", 59.94) 59 >>> timestamp_to_frame("00:01:00", 60) 3600 >>> timestamp_to_frame("01:00:00", 60) 216000
-
epic_kitchens.time.
timestamp_to_seconds
(timestamp)[source]¶ Convert a timestamp into total number of seconds
- Parameters
timestamp (
str
) – formatted asHH:MM:SS[.FractionalPart]
- Return type
- Returns
timestamp
converted to seconds
Examples
>>> timestamp_to_seconds("00:00:00") 0.0 >>> timestamp_to_seconds("00:00:05") 5.0 >>> timestamp_to_seconds("00:00:05.5") 5.5 >>> timestamp_to_seconds("00:01:05.5") 65.5 >>> timestamp_to_seconds("01:01:05.5") 3665.5
epic_kitchens.video¶
-
class
epic_kitchens.video.
FlowModalityIterator
(dilation=1, stride=1, bound=20, rgb_fps=59.94)[source]¶ Bases:
epic_kitchens.video.ModalityIterator
Iterator for optical flow \((u, v)\) frames
-
class
epic_kitchens.video.
ModalityIterator
[source]¶ Bases:
abc.ABC
Interface that a modality extracted from video must implement
-
class
epic_kitchens.video.
RGBModalityIterator
(fps)[source]¶ Bases:
epic_kitchens.video.ModalityIterator
Iterator for RGB frames
-
epic_kitchens.video.
get_narration
(annotation)[source]¶ Get narration from annotation row, defaults to
"unnarrated"
if row has no narration column.
-
epic_kitchens.video.
iterate_frame_dir
(root)[source]¶ Iterate over a directory of video dirs with the hierarchy
root/P01/P01_01/
-
epic_kitchens.video.
split_dataset_frames
(modality_iterator, frames_dir, segment_root_dir, annotations, frame_format='frame%06d.jpg', pattern=re.compile('.*'))[source]¶ Split dumped video frames from
frames_dir
into directories withinsegment_root_dir
for each video segment defined inannotations
.- Parameters
modality_iterator (
ModalityIterator
) – Modality iterator of framesframes_dir (
Path
) – Directory containing dumped framessegment_root_dir (
Path
) – Directory to write split segments toannotations (
DataFrame
) – Dataframe containing segment informationframe_format (str, optional) – Old style string format that must contain a single
%d
formatter describing file name format of the dumped frames.pattern (re.Pattern, optional) – Regexp to match video directories
- Return type
None
-
epic_kitchens.video.
split_video_frames
(modality_iterator, frame_format, video_annotations, segment_root_dir, video_dir)[source]¶ Split frames from a single video file stored in
video_dir
into segment directories stored insegment_root_dir
.- Parameters
modality_iterator (
ModalityIterator
) – Modality iteratorframe_format (
str
) – Old style string format that must contain a single%d
formatter describing file name format of the dumped frames.video_annotations (
DataFrame
) – Dataframe containing rows only corresponding to video frames stored invideo_dir
segment_root_dir (
Path
) – Directory to write split segments tovideo_dir (
Path
) – Directory containing dumped frames for a single video
- Return type
None