Paper— Semi-Automatic 2D (cid:1) to (cid:1) 3D Conversion Using Low-Rank Matrix Recovery Semi-Automatic 2D (cid:1)(cid:1) to (cid:1)(cid:1) 3D Conversion Using Low-Rank Matrix Recovery

— Semi-automatic 2D-to-3D conversion is a promising solution to 3D stereoscopic content creation. However, the depth continuous transition be-tween user marked neighboring regions will be lost when user scribbles are sparse. To help solve this problem, a piecewise-continuity regularized low-rank matrix recovery method is developed. Our approach is based on the fact that a depth-map can be decomposed into a low-rank matrix and an outlier term matrix. First, an initial dense depth-map is interpolated from the user scribbles us-ing matting Laplacian scheme under the assumption that depth-map is piece-wise-continuous. Second, a piecewise-continuity constrained low-rank recovery model is developed to remove outliers which are introduced by the interpolation. Experimental comparisons with existing algorithms show that our method demonstrates significant advantage over depth continuous transition between neighboring regions.


Introduction
With the development of 3D display technology, increasingly kinds of 3D electronic products, such as television, mobile phone, projector, are appearing in the ordinary people's life [1]. However, there is little 3D content to be played on these devices. Most videos and images are still in 2D. Thus, it is urgent need for 2D to 3D conversion which can generate 3D content from existing 2D images/videos.
The main challenge of 2D to 3D conversion is how to retrieve the depth information from 2D images/videos which lost in the capture process. Existing 2D to 3D conversion methods can be generally divided into two categories: automatic and semiautomatic ones. The automatic conversion methods rely on different kinds of depth cues to generate depth-maps. Since the relationships between these cues and depth are nonlinear, current automatic methods usually make some global assumptions about the scene. Once the assumptions do not hold, depth errors will appear. Therefore, the accuracy of depth-maps generated by the automatic methods still can't meet the 3D display demand. Semi-automatic methods are possible to get higher quality depth-maps as they combine both depth cues and manual operations. Therefore, in recent years, many semi-automatic schemes have been proposed. However, if the user scribbles are sparse, existing methods are hard to capture the depth continuous change between neighboring regions, which will lead to visual fatigue. Surprisingly, there are few semi-automatic works addressing the issue.
We tackle the issue by formulating the depth estimation as a low-rank matrix recovery problem. Our work is motivated by the recent matrix completion to the colorization problem [2]. Unfortunately, the method in [2] cannot be applied to depth estimation directly. An additional regularization term should be introduced to the method so as to improve the matrix completion accuracy. Since the color image can be converted to the monochrome image by a transform matrix, in [2], an extra regulation term can be added to the matrix completion using the relation. However, no transform matrix is available which converts the estimated depth-map to the input image. To fix the problem, we assume that two neighboring pixels should have similar depth if their colors are similar. We formalize this premise into a local depth consistency interpolation which is motivated by matting Laplacian method [3]. Then, we develop a discontinuity constrained low-rank matrix recovery approach to refine the interpolated result.
Similar to StereoBrush [4], in our method, the user brushes sparse scribbles on an input color image where lighter intensities indicate closer from the camera and vice versa. By formulating our problem into a discontinuity constrained low-rank matrix recovery, depth transition between neighboring regions will be more continuous while preserving depth boundaries. In particular, the main contributions of our work are: ! To the best of our knowledge, our work is the first to formulate semi-automatic 2D-to-3D conversion as a low-rank matrix recovery problem. The low-rank matrix representation can refine depth-maps by removing the outlier term. This initializes to applying recent advances in low-rank methods to the 2D-to-3D conversion problem. ! Low-rank matrix recovery can work when enough samples are available. Inspired by matting Laplacian method [3], we develop a local depth consistency interpolation method to provide the ample samples from the sparse user scribbles. ! We develop a quadratic cost regularized low-rank matrix recovery model to remove depth outliers while preserving object boundaries.
The rest of the papers is organized as follows. Recent semi-automatic depth estimation methods are introduced in section 2. We formulate the 2D-to-3D conversion as a low-rank matrix recovery problem and devise a discontinuity preserving smooth term to improve the performance of low-rank method in section 3. Section 4 gives a detailed explanation of the steps of our low-rank method. In section 5, we derive the augmented Lagrange multiplier (ALM) algorithm [5] to solve our low-rank matrix recovery problem. We demonstrate the performance of our method in comparison with related semi-automatic depth estimation approaches in section 6.

Related works
In this section, only related semi-automatic 2D-to-3D conversion approaches are discussed. A more detailed review of related methods can be found in [6]. Wu et al. [7] use the interactive segmentation tool to extract the object from the background, and then the depth information is assigned to the segmented objects and the background respectively. However, the interactive segmentation is cumbersome, and not easier than the manual 2D-to-3D conversion. To make the interactive segmentation easier, Aksoy et al. [8] over-segment image into regions with geometrical convexity and intensity homogeneity, and then the regions are merged with respect to the user scribbles, geometry and intensity constraints. Once segmentation is done, user is again required to mark strokes on the segmented objects indicating relative depth ordering. In [9], Guttmann et al. propose a segmentation like depth estimation method where each user label is not considered as a separate object but as a separate depth. User scribbles are marked on a few key frames to assign some desired depth values. Next, these scribbles are used to train the support vector machine (SVM) classifiers using the scale-invariant feature transform (SIFT) for each key frame. Then, those pixels with high confidence are tuned for the particular depth by the SVM classifier. Finally, a linear system is solved via least squares to get the rest of depth. The system combines the initial user constraints, spatial and time smoothness constraints. Solving of this system is equivalent to solving the random walks (RW) problem.
The issue with [9] is the computation complexity which requires SIFT feature extraction and SVM classifiers training to get the final depth-map. In [10], the SVM classifiers are removed and only user scribbles are used to generate the initial depthmap. The final depth-map is interpolated by the RW segmentation framework developed by Grady [11]. The limitation with [10] is that the resulting depth-map object boundaries are lost due to the smoothing properties of RW. To solve the problem, in [12], Phan combines RW [11] and Graph Cuts (GC) [13] which utilizes both the smoothing properties of RW and the strong object boundaries provided by GC. The initial depth-map is generated by GC. Then, edges in RW are weighted by the initial depth-map. The combination cleans up object boundaries while maintaining smooth gradients in RW. However, when user scribbles are sparse, the combination cannot capture the continuous depth transition between neighboring regions. To improve depth quality at object boundaries, in [14], Yuan [3] to perform sparse interpolation for depth from defocus from a single image. The interpolated defocus map by matting Laplacian can capture the continuous change of the depth. It is the desired property which guarantees continuous transition between neighboring regions in depth-map. However, the issue with matting Laplacian is that the texture details in the input image will be introduced to the depth-map during the interpolation process. Inspired by the recent success of low-rank methods application in image processing, e.g., colorization [2], image restoration [16], texture repairing [17] and etc., we develop a depth estimation approach combining matting Laplacian and low-rank method which captures the continuous depth changes while removing texture details introhttp://www.i-joe.org duced by matting Laplacian scheme. We notice that Lu et al. [18] also apply low-rank constraints on depth enhancement. The differences between our and their method are: (1) Lu et al. [18] assume RGB-D patches lie in a low-dimensional subspace and we apply low-rank regularization to the whole depth-map; (2) the focus of [18] is depthmap completion and our problem is sparse-to-dense depth propagation in semiautomatic 2D to 3D conversion.

Problem formulation
First of all, we provide here some notations used throughout the paper. Scalars are non-bold, vectors are bold lowercase and matrices are bold capital. We assume that matrices are stored in column-major order and one-based indexing is used.
The conjugate transpose of A is ! A and similarly for vectors. The matrices trace is denoted by Tr. The Frobenius norm of a matrix A is denoted by F A , the 0 l norm by 0 A (i.e., a total number of non-zero elements in A ), the 1 l norm by 1 A , and the standard inner product between two matrices A and B by , P is a reshape operator which converts a matrix to a column vector, and Suppose that we are given a color image This is an ill posed problem since we have no knowledge about the underlying true depth, and the evaluation of best approximation can be in many ways. To well define the problem, prior knowledge should be introduced. We consider the problem under two assumptions: Assumption 1. The depth-maps are piecewise-continuous. Assumption 2. The depth-maps are low-rank. The Assumption 1 is reasonable because depth-maps are uniform, and depth discontinuity only appears on objects boundaries. The Assumption 2 derives from the fact that any image can be effectively approximated by a low-rank matrix plus a sparse matrix [19]. With Assumption 1, we obtain an initial estimated depth-map using local depth consistency interpolation which is motivated by the matting Laplacian method [3]. Then, a refined depth-map is extracted from the initial depth-map by formulating depth estimation as a low-rank matrix recovery problem with the Assumption 2. Further, to thoroughly remove the texture details introduced by interpolation, we add a discontinuity constrained smooth regularization term to the low-rank method.

Method
The workflow of the proposed method is showed in Fig.1. In our approach, the user masks on the input color image, generating a scribbles map covering on original image indicating the user desired depth. The masked intensity or color is lighter, the depth value is bigger. Then, we use image subtraction techniques to extract the sparse depth hypothesis. To get the initial dense depth-map, we apply matting Laplacian method [3] to perform sparse interpolation with the Assumption 1. The main idea of matting Laplacian scheme is that depth can be represented as a linear function of colors in a small window. While the initial depth from matting Laplacian scheme can capture continuous changes between neighboring regions, it introduces texture details from input color image to the depth-map. These texture details damage the depth uniformity inside the same object which will lead to visual fatigue. Since depth-map is generally piecewisecontinuous, there are strong correlations between neighboring regions, we can represent depth-map by low-rank matrix. Then, the texture details are supposed to be the outlier term. However, the texture details are not sparse, the typical low-rank method alone cannot remove the texture thoroughly. Therefore, we introduce a constrained term to the low-rank method. The constrained term smooths depth in low gradient regions while preserving depth in high gradient regions. Generally, the gradient is low inside the objects and is high at object boundaries. Thus, the constrained term can smooth depth while preserving object boundaries. We call it discontinuity preserving constrained term. With help of the constrained term, the low-rank method can remove most texture details in interpolated depth-map while preserving object boundaries.
With the recovered depth from the input color image, we can now apply depth image based rendering (DIBR) to create a new view for 3D stereoscopic display. Anaglyph image has been a popular representation for stereoscopic 3D. To generate anaglyph image, the input image is given a red hue, and the image synthesized by DIBR is given a cyan hue. The anaglyph image is the combination of the two hue images.

Local depth consistency interpolation
In [20], Cand`es et al. shows the number of sampled entries must be bigger than a constant, then the low-rank matrix can be perfectly recovered with high probability. Thus, in [2], for each unlabeled pixel, monochrome intensity affinities are used to find all its neighboring labeled pixels, and then it is labeled with the weighted sum of the neighbors' labels. However, in 2D-to-3D conversion, user scribbles are sparse and separated. The simple neighboring labels propagation will not work in our case. To propagate the sparse user labels to the entire image, we assume that pixels with similar colors should have roughly similar depth. Namely, depth distributions are local consistency. The local depth consistency interpolation is motivated by the matting Laplacian method [3]. Its intuition is that depth can be represented as a linear function of image colors in a small window.
The depth interpolation problem is formulated as: Where the parameter w balances the relative influence between user scribbles and color similarity. ( ) The optimal f d can be obtained by solving (2) using the conjugate gradient algorithm. The final interpolated depth-map is obtained by

Low-rank matrix recovery
Let h w ! " D ! be the refined depth-map, h w ! " E ! the outlier term. We assume that D is low-rank and E is sparse. Formally, we obtain D by solving the problem, iJOE -Vol. 14, No. 1, 2018 The problem in (3) with rank and 0 l norm minimization is NP-hard. In [18], Cand`es et al. prove that the nuclear norm minimization is the tightest convex relaxation of the NP-hard rank minimization problem. In [21], Elad demonstrates that 0 l norm minimization problem can be approximated by the convex 1 l norm minimization. Thus, the problem in (3) can be relaxed into the following problem: To preserve the depth discontinuity while removing texture details introduced by sparse interpolation, we add an extra regularization term to the low-rank matrix recovery problem. By introducing the regularized term to low-rank method, Assumption 1 and Assumption 2 are used together. Thus, we formulate the problem in (4) to a more robust form as follows: The regularization term in (5) comes from the Assumption 1. Since depth-maps are piecewise-continuous, the uniform color regions should have uniform depth values, and depth edges should coincide with their photometric edges. Because the magnitude of image gradient is smaller in uniform regions and larger around image edges, the depth is desired to be smooth in uniform regions, and be preserved around image edges. Formally, we formulate the piecewise-continuous cost term as follows: Where 1 (1 ) w are smaller. Therefore, we smooth depth in low gradient regions and preserve depth in high gradient regions by minimizing ( ) E D . We can rewrite piecewise-continuous cost for all pixels i = 1, 2, ..., N of (6) in matrix form as: Where ( , ) ( ) E D is just the regularization term in (5).
By introducing the piecewise-continuous regularization term to (4), we obtain a robust low-rank matrix recovery formulation showed in (5) which can preserve depth discontinuity while removing texture details introduced by sparse interpolation. Thus, the depth refinement is converted to find the best solution of (5).

Optimization algorithm
We use the ALM algorithm [5] to solve the problem in (5). In order to solve (5) using ALM, we introduce a slack vector N ! l ! to surrogate d . Thus, the term containing D and the term containing d are decoupled. Then, the problem (5) is equivalently defined as follows: Thus, the solution of problem (5)) is converted to solve (8). The Lagrangian function of problem (8) is: The ALM algorithm solves problem (8) by choosing 1 Y , 2 Y , 1 µ and 2 µ judiciously and then minimizing

Results and discussion
In this section, we report some experimental results which compare our approach with Random Walks (RW) [10], Graph Cuts (GC) [12], hybrid Graph Cuts and Ran-dom Walks (HGR) [12]. We also do experiments for our method with different regularized term's weight to see the regularization term's improvement to low-rank method. The test images in [12] are used for comparison.

Comparisons with existing methods
We first compare the proposed method and three leading algorithms for semiautomatic depth estimation. In Figs. 2-4, we have compared the depth estimation results for several test images. In each figure, panel (a) illustrates the input color image, panel (b) shows the user marked scribbles overlaid on the original image, panel (c) is the depth-map generated by GC [12], panel (d) is the result done by RW [10], panel (e) is the depth from HGR [12], panel (f) shows depth generated by our approach. For Figs. 2-4, we both set 10 ! = , 100 ! = .
As shown in Fig.2, the depth from our method captures the continuous changes of the scene. We can feel the gradual depth transition inside the fish and reeds of our result. However, the depth variation of the left marked reed is lost in results of RW [10], GC [12], and HGR [12]. The continuous depth transition in the region of the fish's head is also lost in these methods. With our approach, we can experience more depth transition between object boundaries in Fig.2.
As shown in Fig.3, with RW [10], GC [12], and HGR [12], the bottom left corner and bottom right corner of the depth-map are over-light making the changes between these regions and their neighboring regions abrupt. Moreover, the depth discontinuous changes between two men are lost with hybrid Graph Cuts and Random Walks. The depth transition of those regions are more continuous of our method. The two men's boundaries of our method in Fig.3 are more recognizable than the three approaches.   As shown in Fig.4, on the grass there are only two user scribbles. This scenario makes the depth-map of RW [10], GC [12], and HGR [12] change abruptly from bottom to up on the grass. In our result, the depth changes are gradual. Besides, the tower boundaries of our approach are clearly demonstrated. The RW [10] and HGR [12] are failed to recovery the depth of small structures of the tower, e.g. the pillars on the left ground level of Fig. 4(d) and 4(e) are lost.

Improvement of low-rank method by regularization
The "Bowling" image taken from the Middlebury stereo evaluation database is used to evaluate regularized term's improvement to low-rank method. Since the ground truth depth of "Bowling" image is available, the PSNR is used to measure the improvement from the regularized term. We set 10 ! = , and increase ! from 0 to 200 gradually. As shown in Fig.5, The PSNR rose along with regularized term's weight! .
Although the PSNR will be improved as ! be increased, the object boundaries also begin to be blurred. Fig.6 shows examples of recovered depth with different weight ! setting. From Fig.6, we notice that the object boundaries inside the red box are blurred when =1000 ! . In order to see the differences clearly, in Fig.6, the outlier term E with different weight ! is scaled 5 times. The outlier terms in Fig.6 show that more and more details are removed from the initial estimated depth along with increasing! . In experiments, we find 100 ! = is a proper choice that trades off the object boundaries versus the depth uniformity.   Table 1 shows the PSNR of RW [10], GC [12], and HGR [12]. Fig.5 and Table 1 illustrate that even without regularized term, our method improves the PSNR by more than 1 dB compared with RW [10], GC [12], and HGR [12].

Discussion
In semi-automatic 2D-to-3D conversion, when user scribbles are sparse, how to make depth transition between user marked regions be continuous is a challenge. Zhuo has proved that matting Laplacian based sparse interpolation can capture the continuous changes of the depth-map [15]. But the issue with matting Laplacian interpolation is that texture details from input color image will be introduced to the depth-map. We argue that the discontinuity preserving smooth constrained low-rank method offers one promising approach to remove the texture information introduced by the interpolation.
The weakness of our method is that the depth of a sub-region beside an object is lower than the depth of neighboring regions when its color is darker than the color of neighboring regions. For example, on the right side of Fig. 3(f), the depth of the dog's neck is lower than the neighboring regions. This is because the matting Laplacian sparse interpolation is based on the assumption that depth is a linear function of image colors in a small window. In the input color image, the color of this region is darker than the other parts of the dog. This region is too large to be corrected by the low-rank method. Maybe single-view depth cues can be introduce to fix the issue.

Conclusion
The study focuses on continuous depth transition of semi-automatic 2D-to-3D conversion when user scribbles are sparse. Based on the view that depth is piececontinuous, we obtain the initial dense depth-map from user scribbles by matting Laplacian sparse interpolation. By treating the depth-map refinement as a low-rank matrix recovery problem, we develop a discontinuity preserving smooth regularized low-rank method to remove texture details which is introduced by sparse interpolation. The experimental results have demonstrated that the depth transition between sparse marked regions of our method is more continuous in comparison with existing state-of-art method.
To the best of our knowledge, this is the first time that low-rank method is used for 2D-to-3D conversion. There are several ways to improve the method. Since the initial samples is vital for low-rank matrix recovery, we can make use of existing kinds of depth cues to refine the sparse interpolated result. Other regularization terms can also be incorporated into low-rank method with the same way. Moreover, we can introduce an error term to make the low-rank method be more robust to outliers.