dataframe - ValueError:调用层“tf.__operators__.getitem_20”时遇到异常(类型 SlicingOpLambda)

按照 tensorflow 的教程并尝试使用多标签输入功能自己重新创建代码并遇到此错误。我重新创建了示例代码,如下所示。

DataFrame 创建:

sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']], "feature_2": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})
Output:

    feature_1   feature_2
0   [aa, bb, cc]    [aa, bb, cc]
1   [cc, dd, ee]    [cc, dd, ee]
2   [cc, aa, ee]    [cc, aa, ee]

输入层:

inputs = {}

inputs['feature_1'] = tf.keras.Input(shape=(), name='feature_1', dtype=tf.string)
inputs['feature_2'] = tf.keras.Input(shape=(), name='feature_2', dtype=tf.string)
Output:

{'feature_1': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_1')>,
 'feature_2': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_2')>}

预处理层:

preprocessed = []

for name, column in sample_df.items():
    vocab = ['aa', 'bb', 'cc', 'dd', 'ee']
    lookup = tf.keras.layers.StringLookup(vocabulary=vocab, output_mode='multi_hot')

    print(f'name: {name}')
    print(f'vocab: {vocab}\n')

    x = inputs[name][:, tf.newaxis]
    x = lookup(x)
    preprocessed.append(x)
Output:

name: feature_1
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']

name: feature_2
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']

[<KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_27')>,
 <KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_28')>]

模型创建:

preprocessed_result = tf.concat(preprocessed, axis=-1)
preprocessor = tf.keras.Model(inputs, preprocessed_result)
tf.keras.utils.plot_model(preprocessor, rankdir="LR", show_shapes=True)
Output:

<KerasTensor: shape=(None, 12) dtype=float32 (created by layer 'tf.concat_4')>

错误:

preprocessor(dict(sample_df.iloc[:1]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
.../sample.ipynb Cell 63' in <cell line: 1>()
----> 1 preprocessor(dict(sample_df.iloc[:1]))

File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
    100     dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)

ValueError: Exception encountered when calling layer "tf.__operators__.getitem_20" (type SlicingOpLambda).

Failed to convert a NumPy array to a Tensor (Unsupported object type list).

Call arguments received:
  • tensor=0    [aa, bb, cc]
Name: feature_2, dtype: object
  • slice_spec=({'start': 'None', 'stop': 'None', 'step': 'None'}, 'None')
  • var=None

https://www.tensorflow.org/tutorials/load_data/pandas_dataframe#create_and_train_a_model

任何有关错误或进一步了解错误的帮助将不胜感激。非常感谢您提前。

回答1

我为任何有兴趣/面临类似问题的人创建了一个解决方法。这只是一种解决方法,而不是解决方案。

解决方法:由于我的多热编码本质上是二进制的,所以我只是将它们各自分解为一个特性。

示例代码:

sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})

feature_1_labels = set()

for i in range(sample_df.shape[0]):
    feature_1_labels.update(sample_df.iloc[i]['feature_1'])

for label in sorted(feature_1_labels):
    sample_df[label] = 0

for i in range(sample_df.shape[0]):
    for label in sample_df.iloc[i]['feature_1']:
        sample_df.iloc[i, sample_df.columns.get_loc(label)] = 1

sample_df
Output:

    feature_1   aa  bb  cc  dd  ee
0   [aa, bb, cc]    1   1   1   0   0
1   [cc, dd, ee]    0   0   1   1   1
2   [cc, aa, ee]    1   0   1   0   1

注意:这样做会显着增加输入特征的数量。要记住的事情。

随时让我知道更好的解决方法/如果我有任何错误:)

相似文章

随机推荐

最新文章