Real-time indoor scene description for the visually impaired using autoencoder fusion strategies with visible cameras