data_loading_util
Author: Heli Qi Affiliation: NAIST Date: 2022.11
get_file_birthtime(file_path, readable_time=False)
Get the creation time of a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path of the file. |
required |
readable_time
|
bool
|
If True, the output is a string in a readable format. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
float or str: The creation time of the file. If readable_time is True, a string in a readable format is returned. |
Source code in speechain/utilbox/data_loading_util.py
load_idx2data_file(file_path, data_type=str, separator=' ', do_separate=True)
Load a dictionary from a file or a list of files containing key-value pairs, where the key is the index of a data instance and the value is the target data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str or List[str]
|
Absolute path of the file(s) to be loaded. |
required |
data_type
|
type
|
The data type of the values in the returned dictionary. It should be a Python built-in data type. Defaults to str. |
str
|
separator
|
str
|
The separator between the data instance index and the data value in each line of the file. Defaults to ' '. |
' '
|
do_separate
|
bool
|
Whether to separate each row by the given separator. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing key-value pairs, where the key is the index of a data instance and |
Dict[str, Any]
|
the value is the target data. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the given file does not exist, this error is raised. |
Source code in speechain/utilbox/data_loading_util.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
read_data_by_path(data_path, return_tensor=False, return_sample_rate=False)
This function reads data from the file in the specified path, considering the file format and extension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_path
|
str
|
The path where the data file to be read is placed. |
required |
return_tensor
|
bool
|
Whether to return data as torch.Tensor. Defaults to False. |
False
|
return_sample_rate
|
bool
|
Whether to return the sample rate. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
ndarray or Tensor
|
np.ndarray or torch.Tensor: Array-like data. If return_tensor is False, the data type will be numpy.ndarray; otherwise, the data type will be torch.Tensor. |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If the file format is not supported, this error is raised. |
Source code in speechain/utilbox/data_loading_util.py
read_idx2data_file_to_dict(path_dict)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_dict
|
Dict[str, str or List[str]]
|
Dict[str, str or List[str] The path dictionary of the 'idx2XXX' files to be read. In each key-value item, the key is the data name and the value is the path of the target 'idx2XXX' files. Multiple file paths can be given in a list. |
required |
(Dict[str, str], List[str])
Type | Description |
---|---|
(Dict[str, str], List[str])
|
Both the result dictionary and the data index list will be returned. |
Source code in speechain/utilbox/data_loading_util.py
search_file_in_subfolder(curr_query, tgt_match_fn=None, return_name=False, return_sorted=True)
Search for files in a directory and its subdirectories that satisfy a certain condition.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
curr_query
|
str
|
Path of the directory to search in. |
required |
tgt_match_fn
|
callable
|
A function that takes a file name and returns a boolean value. If provided, only files that satisfy this condition are returned. If not provided, all files will be returned. Defaults to None. |
None
|
return_name
|
bool
|
If True, return file names instead of file paths. Defaults to False. |
False
|
return_sorted
|
bool
|
If True, the output list will be sorted in lexicographical order. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
list |
A list of file paths or file names (depending on return_name) that satisfy tgt_match_fn. |