d556b19fad4ab0dee25f5e215519e5e2eed30230
[openwrt/staging/blocktrron.git] /
1 From 359a25fe7208803d997488a96714c19e84b74f96 Mon Sep 17 00:00:00 2001
2 From: John Cox <jc@kynesim.co.uk>
3 Date: Thu, 5 Mar 2020 18:30:41 +0000
4 Subject: [PATCH] staging: media: rpivid: Add Raspberry Pi V4L2 H265
5 decoder
6
7 This driver is for the HEVC/H265 decoder block on the Raspberry
8 Pi 4, and conforms to the V4L2 stateless decoder API.
9
10 Signed-off-by: John Cox <jc@kynesim.co.uk>
11
12 staging: media: rpivid: Select MEDIA_CONTROLLER and MEDIA_CONTROLLER_REQUEST_API
13
14 MEDIA_CONTROLLER_REQUEST_API is a hidden option. If rpivid depends on it,
15 the user would need to first enable another driver that selects
16 MEDIA_CONTROLLER_REQUEST_API, and only then rpivid would become available.
17
18 By selecting it instead of depending on it, it becomes possible to enable
19 rpivid without having to enable other potentially unnecessary drivers.
20
21 Signed-off-by: Hristo Venev <hristo@venev.name>
22
23 rpivid_h265: Fix width/height typo
24
25 Signed-off-by: popcornmix <popcornmix@gmail.com>
26
27 rpivid_h625: Fix build warnings
28
29 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
30
31 staging: rpivid: Fix crash when CMA alloc fails
32
33 If realloc to increase coeff size fails then attempt to re-allocate
34 the original size. If that also fails then flag a fatal error to abort
35 all further decode.
36
37 Signed-off-by: John Cox <jc@kynesim.co.uk>
38
39 rpivid: Request maximum hevc clock
40
41 Query maximum and minimum clock from driver
42 and use those
43
44 Signed-off-by: Dom Cobley <popcornmix@gmail.com>
45
46 rpivid: Switch to new clock api
47
48 Signed-off-by: Dom Cobley <popcornmix@gmail.com>
49
50 rpivid: Only clk_request_done once
51
52 Fixes: 25486f49bfe2e3ae13b90478d1eebd91413136ad
53 Signed-off-by: Dom Cobley <popcornmix@gmail.com>
54
55 media: rpivid: Remove the need to have num_entry_points set
56
57 VAAPI H265 has num entry points but never sets it. Allow a VAAPI
58 shim to work without requiring rewriting the VAAPI driver.
59 num_entry_points can be calculated from the slice_segment_addr
60 of the next slice so delay processing until we have that.
61
62 Also includes some minor cosmetics.
63
64 Signed-off-by: John Cox <jc@kynesim.co.uk>
65
66 media: rpivid: Convert to MPLANE
67
68 Use multi-planar interface rather than single plane interface. This
69 allows dmabufs holding compressed data to be resized.
70
71 Signed-off-by: John Cox <jc@kynesim.co.uk>
72
73 media: rpivid: Add an enable count to irq claim Qs
74
75 Add an enable count to the irq Q structures to allow the irq logic to
76 block further callbacks if resources associated with the irq are not
77 yet available.
78
79 Signed-off-by: John Cox <jc@kynesim.co.uk>
80
81 media: rpivid: Add a Pass0 to accumulate slices and rework job finish
82
83 Due to overheads in assembling controls and requests it is worth having
84 the slice assembly phase separate from the h/w pass1 processing. Create
85 a queue to service pass1 rather than have the pass1 finished callback
86 trigger the next slice job.
87
88 This requires a rework of the logic that splits up the buffer and
89 request done events. This code contains two ways of doing that, we use
90 Ezequiel Garcias <ezequiel@collabora.com> solution, but expect that
91 in the future this will be handled by the framework in a cleaner manner.
92
93 Fix up the handling of some of the memory exhaustion crashes uncovered
94 in the process of writing this code.
95
96 Signed-off-by: John Cox <jc@kynesim.co.uk>
97
98 media: rpivid: Map cmd buffer directly
99
100 It is unnecessary to have a separate dmabuf to hold the cmd buffer.
101 Map it directly from the kmalloc.
102
103 Signed-off-by: John Cox <jc@kynesim.co.uk>
104
105 media: rpivid: Improve values returned when setting output format
106
107 Guess a better value for the compressed bitstream buffer size
108
109 Signed-off-by: John Cox <jc@kynesim.co.uk>
110
111 media: rpivid: Improve stream_on/off conformance & clock setup
112
113 Fix stream on & off such that failures leave the driver in the correct
114 state. Ensure that the clock is on when we are streaming and off when
115 all contexts attached to this device have stopped streaming.
116
117 Signed-off-by: John Cox <jc@kynesim.co.uk>
118
119 media: rpivid: Improve SPS/PPS error handling/validation
120
121 Move size and width checking from bitstream processing to control
122 validation
123
124 Signed-off-by: John Cox <jc@kynesim.co.uk>
125
126 media: rpivid: Fix H265 aux ent reuse of the same slot
127
128 It is legitimate, though unusual, for an aux ent associated with a slot
129 to be selected in phase 0 before a previous selection has been used and
130 released in phase 2. Fix such that if the slot is found to be in use
131 that the aux ent associated with it is reused rather than an new aux
132 ent being created. This fixes a problem where when the first aux ent
133 was released the second was lost track of.
134
135 This bug spotted in Nick's testing. It may explain some other occasional,
136 unreliable decode error reports where dmesg included "Missing DPB AUX
137 ent" logging.
138
139 Signed-off-by: John Cox <jc@kynesim.co.uk>
140
141 media: rpivid: Update to compile with new hevc decode params
142
143 DPB entries have moved from slice params to the new decode params
144 attribute - update to deal with this. Also fixes fallthrough
145 warnings which seem to be new in 5.14.
146
147 Signed-off-by: John Cox <jc@kynesim.co.uk>
148
149 media: rpivid: Make slice ctrl dynamic
150
151 Allows the user to submit a whole frames worth of slice headers in
152 one lump along with a single bitstream dmabuf for the whole lot.
153 This saves potentially a lot of bitstream copying.
154
155 Signed-off-by: John Cox <jc@kynesim.co.uk>
156
157 media: rpivid: Only create aux entries for H265 if needed
158
159 Only create aux entries of mv info for frames where that info might
160 be used by a later frame. This saves some memory bandwidth and
161 potentially some memory.
162
163 Signed-off-by: John Cox <jc@kynesim.co.uk>
164
165 media: rpivid: Avoid returning EINVAL to a G_FMT ioctl
166
167 V4L2 spec says that G/S/TRY_FMT IOCTLs should never return errors for
168 anything other than wrong buffer types. Improve the capture format
169 function such that this is so and unsupported values get converted
170 to supported ones properly.
171
172 Signed-off-by: John Cox <jc@kynesim.co.uk>
173
174 media: rpivid: Remove unused ctx state variable and defines
175
176 Remove unused ctx state tracking variable and associated defines.
177 Their presence implies they might be used, but they aren't.
178
179 Signed-off-by: John Cox <jc@kynesim.co.uk>
180
181 media: rpivid: Ensure IRQs have completed before uniniting context
182
183 Before uniniting the decode context sync with the IRQ queues to ensure
184 that decode no longer has any buffers in use. This fixes a problem that
185 manifested as ffmpeg leaking CMA buffers when it did a stream off on
186 OUTPUT before CAPTURE, though in reality it was probably much more
187 dangerous than that.
188
189 Signed-off-by: John Cox <jc@kynesim.co.uk>
190
191 media: rpivid: remove min_buffers_needed from src queue
192
193 Remove min_buffers_needed=1 from src queue init. Src buffers are bound
194 to media requests therefore this setting is not needed and generates
195 a WARN in kernel 5.16.
196
197 Signed-off-by: John Cox <jc@kynesim.co.uk>
198
199 rpivid: Use clk_get_max_rate()
200
201 The driver was using clk_round_rate() to figure out the maximum clock
202 rate that was allowed for the HEVC clock.
203
204 Since we have a function to return it directly now, let's use it.
205
206 Signed-off-by: Maxime Ripard <maxime@cerno.tech>
207
208 media: rpivid: Apply V4L2 stateless API changes
209
210 media: rpivid: Fix fallthrough warning
211
212 Replace old-style /* FALLTHRU */ with fallthrough;
213
214 Signed-off-by: John Cox <jc@kynesim.co.uk>
215
216 media: rpivid: Set min value as well as max for HEVC_DECODE_MODE
217
218 As only one value can be accepted set both min and max to that value.
219
220 Signed-off-by: John Cox <jc@kynesim.co.uk>
221
222 media: rpivid: Accept ANNEX_B start codes
223
224 Allow the START_CODE control to take ANNEX_B as a value. This makes no
225 difference to any part of the decode process as the added bytes are in
226 data that we ignore. This helps my testing and may help userland code
227 that expects to send those bytes.
228
229 Signed-off-by: John Cox <jc@kynesim.co.uk>
230
231 rpivid: Convert to new clock rate API
232
233 Signed-off-by: Maxime Ripard <maxime@cerno.tech>
234 ---
235 drivers/media/v4l2-core/v4l2-mem2mem.c | 2 -
236 drivers/staging/media/Kconfig | 2 +
237 drivers/staging/media/Makefile | 2 +-
238 drivers/staging/media/rpivid/Kconfig | 16 +
239 drivers/staging/media/rpivid/Makefile | 5 +
240 drivers/staging/media/rpivid/rpivid.c | 459 ++++
241 drivers/staging/media/rpivid/rpivid.h | 203 ++
242 drivers/staging/media/rpivid/rpivid_dec.c | 96 +
243 drivers/staging/media/rpivid/rpivid_dec.h | 19 +
244 drivers/staging/media/rpivid/rpivid_h265.c | 2698 +++++++++++++++++++
245 drivers/staging/media/rpivid/rpivid_hw.c | 383 +++
246 drivers/staging/media/rpivid/rpivid_hw.h | 303 +++
247 drivers/staging/media/rpivid/rpivid_video.c | 696 +++++
248 drivers/staging/media/rpivid/rpivid_video.h | 33 +
249 14 files changed, 4914 insertions(+), 3 deletions(-)
250 create mode 100644 drivers/staging/media/rpivid/Kconfig
251 create mode 100644 drivers/staging/media/rpivid/Makefile
252 create mode 100644 drivers/staging/media/rpivid/rpivid.c
253 create mode 100644 drivers/staging/media/rpivid/rpivid.h
254 create mode 100644 drivers/staging/media/rpivid/rpivid_dec.c
255 create mode 100644 drivers/staging/media/rpivid/rpivid_dec.h
256 create mode 100644 drivers/staging/media/rpivid/rpivid_h265.c
257 create mode 100644 drivers/staging/media/rpivid/rpivid_hw.c
258 create mode 100644 drivers/staging/media/rpivid/rpivid_hw.h
259 create mode 100644 drivers/staging/media/rpivid/rpivid_video.c
260 create mode 100644 drivers/staging/media/rpivid/rpivid_video.h
261
262 --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
263 +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
264 @@ -492,8 +492,6 @@ void v4l2_m2m_job_finish(struct v4l2_m2m
265 * holding capture buffers. Those should use
266 * v4l2_m2m_buf_done_and_job_finish() instead.
267 */
268 - WARN_ON(m2m_ctx->out_q_ctx.q.subsystem_flags &
269 - VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF);
270 spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
271 schedule_next = _v4l2_m2m_job_finish(m2m_dev, m2m_ctx);
272 spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
273 --- a/drivers/staging/media/Kconfig
274 +++ b/drivers/staging/media/Kconfig
275 @@ -34,6 +34,8 @@ source "drivers/staging/media/omap4iss/K
276
277 source "drivers/staging/media/rkvdec/Kconfig"
278
279 +source "drivers/staging/media/rpivid/Kconfig"
280 +
281 source "drivers/staging/media/sunxi/Kconfig"
282
283 source "drivers/staging/media/tegra-video/Kconfig"
284 --- a/drivers/staging/media/Makefile
285 +++ b/drivers/staging/media/Makefile
286 @@ -7,7 +7,7 @@ obj-$(CONFIG_VIDEO_MESON_VDEC) += meson/
287 obj-$(CONFIG_VIDEO_MEYE) += deprecated/meye/
288 obj-$(CONFIG_VIDEO_OMAP4) += omap4iss/
289 obj-$(CONFIG_VIDEO_ROCKCHIP_VDEC) += rkvdec/
290 -obj-$(CONFIG_VIDEO_STKWEBCAM) += deprecated/stkwebcam/
291 +obj-$(CONFIG_VIDEO_RPIVID) += rpivid/
292 obj-$(CONFIG_VIDEO_SUNXI) += sunxi/
293 obj-$(CONFIG_VIDEO_TEGRA) += tegra-video/
294 obj-$(CONFIG_VIDEO_IPU3_IMGU) += ipu3/
295 --- /dev/null
296 +++ b/drivers/staging/media/rpivid/Kconfig
297 @@ -0,0 +1,16 @@
298 +# SPDX-License-Identifier: GPL-2.0
299 +
300 +config VIDEO_RPIVID
301 + tristate "Rpi H265 driver"
302 + depends on VIDEO_DEV && VIDEO_DEV
303 + depends on OF
304 + select MEDIA_CONTROLLER
305 + select MEDIA_CONTROLLER_REQUEST_API
306 + select VIDEOBUF2_DMA_CONTIG
307 + select V4L2_MEM2MEM_DEV
308 + help
309 + Support for the Rpi H265 h/w decoder.
310 +
311 + To compile this driver as a module, choose M here: the module
312 + will be called rpivid-hevc.
313 +
314 --- /dev/null
315 +++ b/drivers/staging/media/rpivid/Makefile
316 @@ -0,0 +1,5 @@
317 +# SPDX-License-Identifier: GPL-2.0
318 +obj-$(CONFIG_VIDEO_RPIVID) += rpivid-hevc.o
319 +
320 +rpivid-hevc-y = rpivid.o rpivid_video.o rpivid_dec.o \
321 + rpivid_hw.o rpivid_h265.o
322 --- /dev/null
323 +++ b/drivers/staging/media/rpivid/rpivid.c
324 @@ -0,0 +1,459 @@
325 +// SPDX-License-Identifier: GPL-2.0
326 +/*
327 + * Raspberry Pi HEVC driver
328 + *
329 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
330 + *
331 + * Based on the Cedrus VPU driver, that is:
332 + *
333 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
334 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
335 + * Copyright (C) 2018 Bootlin
336 + */
337 +
338 +#include <linux/platform_device.h>
339 +#include <linux/module.h>
340 +#include <linux/of.h>
341 +
342 +#include <media/v4l2-device.h>
343 +#include <media/v4l2-ioctl.h>
344 +#include <media/v4l2-ctrls.h>
345 +#include <media/v4l2-mem2mem.h>
346 +
347 +#include "rpivid.h"
348 +#include "rpivid_video.h"
349 +#include "rpivid_hw.h"
350 +#include "rpivid_dec.h"
351 +
352 +/*
353 + * Default /dev/videoN node number.
354 + * Deliberately avoid the very low numbers as these are often taken by webcams
355 + * etc, and simple apps tend to only go for /dev/video0.
356 + */
357 +static int video_nr = 19;
358 +module_param(video_nr, int, 0644);
359 +MODULE_PARM_DESC(video_nr, "decoder video device number");
360 +
361 +static const struct rpivid_control rpivid_ctrls[] = {
362 + {
363 + .cfg = {
364 + .id = V4L2_CID_STATELESS_HEVC_SPS,
365 + .ops = &rpivid_hevc_sps_ctrl_ops,
366 + },
367 + .required = true,
368 + },
369 + {
370 + .cfg = {
371 + .id = V4L2_CID_STATELESS_HEVC_PPS,
372 + .ops = &rpivid_hevc_pps_ctrl_ops,
373 + },
374 + .required = true,
375 + },
376 + {
377 + .cfg = {
378 + .id = V4L2_CID_STATELESS_HEVC_SCALING_MATRIX,
379 + },
380 + .required = false,
381 + },
382 + {
383 + .cfg = {
384 + .id = V4L2_CID_STATELESS_HEVC_DECODE_PARAMS,
385 + },
386 + .required = true,
387 + },
388 + {
389 + .cfg = {
390 + .name = "Slice param array",
391 + .id = V4L2_CID_STATELESS_HEVC_SLICE_PARAMS,
392 + .type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS,
393 + .flags = V4L2_CTRL_FLAG_DYNAMIC_ARRAY,
394 + .dims = { 0x1000 },
395 + },
396 + .required = true,
397 + },
398 + {
399 + .cfg = {
400 + .id = V4L2_CID_STATELESS_HEVC_DECODE_MODE,
401 + .min = V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
402 + .max = V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
403 + .def = V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED,
404 + },
405 + .required = false,
406 + },
407 + {
408 + .cfg = {
409 + .id = V4L2_CID_STATELESS_HEVC_START_CODE,
410 + .min = V4L2_STATELESS_HEVC_START_CODE_NONE,
411 + .max = V4L2_STATELESS_HEVC_START_CODE_ANNEX_B,
412 + .def = V4L2_STATELESS_HEVC_START_CODE_NONE,
413 + },
414 + .required = false,
415 + },
416 +};
417 +
418 +#define rpivid_ctrls_COUNT ARRAY_SIZE(rpivid_ctrls)
419 +
420 +struct v4l2_ctrl *rpivid_find_ctrl(struct rpivid_ctx *ctx, u32 id)
421 +{
422 + unsigned int i;
423 +
424 + for (i = 0; ctx->ctrls[i]; i++)
425 + if (ctx->ctrls[i]->id == id)
426 + return ctx->ctrls[i];
427 +
428 + return NULL;
429 +}
430 +
431 +void *rpivid_find_control_data(struct rpivid_ctx *ctx, u32 id)
432 +{
433 + struct v4l2_ctrl *const ctrl = rpivid_find_ctrl(ctx, id);
434 +
435 + return !ctrl ? NULL : ctrl->p_cur.p;
436 +}
437 +
438 +static int rpivid_init_ctrls(struct rpivid_dev *dev, struct rpivid_ctx *ctx)
439 +{
440 + struct v4l2_ctrl_handler *hdl = &ctx->hdl;
441 + struct v4l2_ctrl *ctrl;
442 + unsigned int ctrl_size;
443 + unsigned int i;
444 +
445 + v4l2_ctrl_handler_init(hdl, rpivid_ctrls_COUNT);
446 + if (hdl->error) {
447 + v4l2_err(&dev->v4l2_dev,
448 + "Failed to initialize control handler\n");
449 + return hdl->error;
450 + }
451 +
452 + ctrl_size = sizeof(ctrl) * rpivid_ctrls_COUNT + 1;
453 +
454 + ctx->ctrls = kzalloc(ctrl_size, GFP_KERNEL);
455 + if (!ctx->ctrls)
456 + return -ENOMEM;
457 +
458 + for (i = 0; i < rpivid_ctrls_COUNT; i++) {
459 + ctrl = v4l2_ctrl_new_custom(hdl, &rpivid_ctrls[i].cfg,
460 + ctx);
461 + if (hdl->error) {
462 + v4l2_err(&dev->v4l2_dev,
463 + "Failed to create new custom control id=%#x\n",
464 + rpivid_ctrls[i].cfg.id);
465 +
466 + v4l2_ctrl_handler_free(hdl);
467 + kfree(ctx->ctrls);
468 + return hdl->error;
469 + }
470 +
471 + ctx->ctrls[i] = ctrl;
472 + }
473 +
474 + ctx->fh.ctrl_handler = hdl;
475 + v4l2_ctrl_handler_setup(hdl);
476 +
477 + return 0;
478 +}
479 +
480 +static int rpivid_request_validate(struct media_request *req)
481 +{
482 + struct media_request_object *obj;
483 + struct v4l2_ctrl_handler *parent_hdl, *hdl;
484 + struct rpivid_ctx *ctx = NULL;
485 + struct v4l2_ctrl *ctrl_test;
486 + unsigned int count;
487 + unsigned int i;
488 +
489 + list_for_each_entry(obj, &req->objects, list) {
490 + struct vb2_buffer *vb;
491 +
492 + if (vb2_request_object_is_buffer(obj)) {
493 + vb = container_of(obj, struct vb2_buffer, req_obj);
494 + ctx = vb2_get_drv_priv(vb->vb2_queue);
495 +
496 + break;
497 + }
498 + }
499 +
500 + if (!ctx)
501 + return -ENOENT;
502 +
503 + count = vb2_request_buffer_cnt(req);
504 + if (!count) {
505 + v4l2_info(&ctx->dev->v4l2_dev,
506 + "No buffer was provided with the request\n");
507 + return -ENOENT;
508 + } else if (count > 1) {
509 + v4l2_info(&ctx->dev->v4l2_dev,
510 + "More than one buffer was provided with the request\n");
511 + return -EINVAL;
512 + }
513 +
514 + parent_hdl = &ctx->hdl;
515 +
516 + hdl = v4l2_ctrl_request_hdl_find(req, parent_hdl);
517 + if (!hdl) {
518 + v4l2_info(&ctx->dev->v4l2_dev, "Missing codec control(s)\n");
519 + return -ENOENT;
520 + }
521 +
522 + for (i = 0; i < rpivid_ctrls_COUNT; i++) {
523 + if (!rpivid_ctrls[i].required)
524 + continue;
525 +
526 + ctrl_test =
527 + v4l2_ctrl_request_hdl_ctrl_find(hdl,
528 + rpivid_ctrls[i].cfg.id);
529 + if (!ctrl_test) {
530 + v4l2_info(&ctx->dev->v4l2_dev,
531 + "Missing required codec control\n");
532 + v4l2_ctrl_request_hdl_put(hdl);
533 + return -ENOENT;
534 + }
535 + }
536 +
537 + v4l2_ctrl_request_hdl_put(hdl);
538 +
539 + return vb2_request_validate(req);
540 +}
541 +
542 +static int rpivid_open(struct file *file)
543 +{
544 + struct rpivid_dev *dev = video_drvdata(file);
545 + struct rpivid_ctx *ctx = NULL;
546 + int ret;
547 +
548 + if (mutex_lock_interruptible(&dev->dev_mutex))
549 + return -ERESTARTSYS;
550 +
551 + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
552 + if (!ctx) {
553 + mutex_unlock(&dev->dev_mutex);
554 + ret = -ENOMEM;
555 + goto err_unlock;
556 + }
557 +
558 + mutex_init(&ctx->ctx_mutex);
559 +
560 + v4l2_fh_init(&ctx->fh, video_devdata(file));
561 + file->private_data = &ctx->fh;
562 + ctx->dev = dev;
563 +
564 + ret = rpivid_init_ctrls(dev, ctx);
565 + if (ret)
566 + goto err_free;
567 +
568 + ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(dev->m2m_dev, ctx,
569 + &rpivid_queue_init);
570 + if (IS_ERR(ctx->fh.m2m_ctx)) {
571 + ret = PTR_ERR(ctx->fh.m2m_ctx);
572 + goto err_ctrls;
573 + }
574 +
575 + /* The only bit of format info that we can guess now is H265 src
576 + * Everything else we need more info for
577 + */
578 + rpivid_prepare_src_format(&ctx->src_fmt);
579 +
580 + v4l2_fh_add(&ctx->fh);
581 +
582 + mutex_unlock(&dev->dev_mutex);
583 +
584 + return 0;
585 +
586 +err_ctrls:
587 + v4l2_ctrl_handler_free(&ctx->hdl);
588 +err_free:
589 + mutex_destroy(&ctx->ctx_mutex);
590 + kfree(ctx);
591 +err_unlock:
592 + mutex_unlock(&dev->dev_mutex);
593 +
594 + return ret;
595 +}
596 +
597 +static int rpivid_release(struct file *file)
598 +{
599 + struct rpivid_dev *dev = video_drvdata(file);
600 + struct rpivid_ctx *ctx = container_of(file->private_data,
601 + struct rpivid_ctx, fh);
602 +
603 + mutex_lock(&dev->dev_mutex);
604 +
605 + v4l2_fh_del(&ctx->fh);
606 + v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
607 +
608 + v4l2_ctrl_handler_free(&ctx->hdl);
609 + kfree(ctx->ctrls);
610 +
611 + v4l2_fh_exit(&ctx->fh);
612 + mutex_destroy(&ctx->ctx_mutex);
613 +
614 + kfree(ctx);
615 +
616 + mutex_unlock(&dev->dev_mutex);
617 +
618 + return 0;
619 +}
620 +
621 +static const struct v4l2_file_operations rpivid_fops = {
622 + .owner = THIS_MODULE,
623 + .open = rpivid_open,
624 + .release = rpivid_release,
625 + .poll = v4l2_m2m_fop_poll,
626 + .unlocked_ioctl = video_ioctl2,
627 + .mmap = v4l2_m2m_fop_mmap,
628 +};
629 +
630 +static const struct video_device rpivid_video_device = {
631 + .name = RPIVID_NAME,
632 + .vfl_dir = VFL_DIR_M2M,
633 + .fops = &rpivid_fops,
634 + .ioctl_ops = &rpivid_ioctl_ops,
635 + .minor = -1,
636 + .release = video_device_release_empty,
637 + .device_caps = V4L2_CAP_VIDEO_M2M_MPLANE | V4L2_CAP_STREAMING,
638 +};
639 +
640 +static const struct v4l2_m2m_ops rpivid_m2m_ops = {
641 + .device_run = rpivid_device_run,
642 +};
643 +
644 +static const struct media_device_ops rpivid_m2m_media_ops = {
645 + .req_validate = rpivid_request_validate,
646 + .req_queue = v4l2_m2m_request_queue,
647 +};
648 +
649 +static int rpivid_probe(struct platform_device *pdev)
650 +{
651 + struct rpivid_dev *dev;
652 + struct video_device *vfd;
653 + int ret;
654 +
655 + dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
656 + if (!dev)
657 + return -ENOMEM;
658 +
659 + dev->vfd = rpivid_video_device;
660 + dev->dev = &pdev->dev;
661 + dev->pdev = pdev;
662 +
663 + ret = 0;
664 + ret = rpivid_hw_probe(dev);
665 + if (ret) {
666 + dev_err(&pdev->dev, "Failed to probe hardware\n");
667 + return ret;
668 + }
669 +
670 + dev->dec_ops = &rpivid_dec_ops_h265;
671 +
672 + mutex_init(&dev->dev_mutex);
673 +
674 + ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
675 + if (ret) {
676 + dev_err(&pdev->dev, "Failed to register V4L2 device\n");
677 + return ret;
678 + }
679 +
680 + vfd = &dev->vfd;
681 + vfd->lock = &dev->dev_mutex;
682 + vfd->v4l2_dev = &dev->v4l2_dev;
683 +
684 + snprintf(vfd->name, sizeof(vfd->name), "%s", rpivid_video_device.name);
685 + video_set_drvdata(vfd, dev);
686 +
687 + dev->m2m_dev = v4l2_m2m_init(&rpivid_m2m_ops);
688 + if (IS_ERR(dev->m2m_dev)) {
689 + v4l2_err(&dev->v4l2_dev,
690 + "Failed to initialize V4L2 M2M device\n");
691 + ret = PTR_ERR(dev->m2m_dev);
692 +
693 + goto err_v4l2;
694 + }
695 +
696 + dev->mdev.dev = &pdev->dev;
697 + strscpy(dev->mdev.model, RPIVID_NAME, sizeof(dev->mdev.model));
698 + strscpy(dev->mdev.bus_info, "platform:" RPIVID_NAME,
699 + sizeof(dev->mdev.bus_info));
700 +
701 + media_device_init(&dev->mdev);
702 + dev->mdev.ops = &rpivid_m2m_media_ops;
703 + dev->v4l2_dev.mdev = &dev->mdev;
704 +
705 + ret = video_register_device(vfd, VFL_TYPE_VIDEO, video_nr);
706 + if (ret) {
707 + v4l2_err(&dev->v4l2_dev, "Failed to register video device\n");
708 + goto err_m2m;
709 + }
710 +
711 + v4l2_info(&dev->v4l2_dev,
712 + "Device registered as /dev/video%d\n", vfd->num);
713 +
714 + ret = v4l2_m2m_register_media_controller(dev->m2m_dev, vfd,
715 + MEDIA_ENT_F_PROC_VIDEO_DECODER);
716 + if (ret) {
717 + v4l2_err(&dev->v4l2_dev,
718 + "Failed to initialize V4L2 M2M media controller\n");
719 + goto err_video;
720 + }
721 +
722 + ret = media_device_register(&dev->mdev);
723 + if (ret) {
724 + v4l2_err(&dev->v4l2_dev, "Failed to register media device\n");
725 + goto err_m2m_mc;
726 + }
727 +
728 + platform_set_drvdata(pdev, dev);
729 +
730 + return 0;
731 +
732 +err_m2m_mc:
733 + v4l2_m2m_unregister_media_controller(dev->m2m_dev);
734 +err_video:
735 + video_unregister_device(&dev->vfd);
736 +err_m2m:
737 + v4l2_m2m_release(dev->m2m_dev);
738 +err_v4l2:
739 + v4l2_device_unregister(&dev->v4l2_dev);
740 +
741 + return ret;
742 +}
743 +
744 +static int rpivid_remove(struct platform_device *pdev)
745 +{
746 + struct rpivid_dev *dev = platform_get_drvdata(pdev);
747 +
748 + if (media_devnode_is_registered(dev->mdev.devnode)) {
749 + media_device_unregister(&dev->mdev);
750 + v4l2_m2m_unregister_media_controller(dev->m2m_dev);
751 + media_device_cleanup(&dev->mdev);
752 + }
753 +
754 + v4l2_m2m_release(dev->m2m_dev);
755 + video_unregister_device(&dev->vfd);
756 + v4l2_device_unregister(&dev->v4l2_dev);
757 +
758 + rpivid_hw_remove(dev);
759 +
760 + return 0;
761 +}
762 +
763 +static const struct of_device_id rpivid_dt_match[] = {
764 + {
765 + .compatible = "raspberrypi,rpivid-vid-decoder",
766 + },
767 + { /* sentinel */ }
768 +};
769 +MODULE_DEVICE_TABLE(of, rpivid_dt_match);
770 +
771 +static struct platform_driver rpivid_driver = {
772 + .probe = rpivid_probe,
773 + .remove = rpivid_remove,
774 + .driver = {
775 + .name = RPIVID_NAME,
776 + .of_match_table = of_match_ptr(rpivid_dt_match),
777 + },
778 +};
779 +module_platform_driver(rpivid_driver);
780 +
781 +MODULE_LICENSE("GPL v2");
782 +MODULE_AUTHOR("John Cox <jc@kynesim.co.uk>");
783 +MODULE_DESCRIPTION("Raspberry Pi HEVC V4L2 driver");
784 --- /dev/null
785 +++ b/drivers/staging/media/rpivid/rpivid.h
786 @@ -0,0 +1,203 @@
787 +/* SPDX-License-Identifier: GPL-2.0 */
788 +/*
789 + * Raspberry Pi HEVC driver
790 + *
791 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
792 + *
793 + * Based on the Cedrus VPU driver, that is:
794 + *
795 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
796 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
797 + * Copyright (C) 2018 Bootlin
798 + */
799 +
800 +#ifndef _RPIVID_H_
801 +#define _RPIVID_H_
802 +
803 +#include <linux/clk.h>
804 +#include <linux/platform_device.h>
805 +#include <media/v4l2-ctrls.h>
806 +#include <media/v4l2-device.h>
807 +#include <media/v4l2-mem2mem.h>
808 +#include <media/videobuf2-v4l2.h>
809 +#include <media/videobuf2-dma-contig.h>
810 +
811 +#define OPT_DEBUG_POLL_IRQ 0
812 +
813 +#define RPIVID_DEC_ENV_COUNT 6
814 +#define RPIVID_P1BUF_COUNT 3
815 +#define RPIVID_P2BUF_COUNT 3
816 +
817 +#define RPIVID_NAME "rpivid"
818 +
819 +#define RPIVID_CAPABILITY_UNTILED BIT(0)
820 +#define RPIVID_CAPABILITY_H265_DEC BIT(1)
821 +
822 +#define RPIVID_QUIRK_NO_DMA_OFFSET BIT(0)
823 +
824 +enum rpivid_irq_status {
825 + RPIVID_IRQ_NONE,
826 + RPIVID_IRQ_ERROR,
827 + RPIVID_IRQ_OK,
828 +};
829 +
830 +struct rpivid_control {
831 + struct v4l2_ctrl_config cfg;
832 + unsigned char required:1;
833 +};
834 +
835 +struct rpivid_h265_run {
836 + u32 slice_ents;
837 + const struct v4l2_ctrl_hevc_sps *sps;
838 + const struct v4l2_ctrl_hevc_pps *pps;
839 + const struct v4l2_ctrl_hevc_decode_params *dec;
840 + const struct v4l2_ctrl_hevc_slice_params *slice_params;
841 + const struct v4l2_ctrl_hevc_scaling_matrix *scaling_matrix;
842 +};
843 +
844 +struct rpivid_run {
845 + struct vb2_v4l2_buffer *src;
846 + struct vb2_v4l2_buffer *dst;
847 +
848 + struct rpivid_h265_run h265;
849 +};
850 +
851 +struct rpivid_buffer {
852 + struct v4l2_m2m_buffer m2m_buf;
853 +};
854 +
855 +struct rpivid_dec_state;
856 +struct rpivid_dec_env;
857 +
858 +struct rpivid_gptr {
859 + size_t size;
860 + __u8 *ptr;
861 + dma_addr_t addr;
862 + unsigned long attrs;
863 +};
864 +
865 +struct rpivid_dev;
866 +typedef void (*rpivid_irq_callback)(struct rpivid_dev *dev, void *ctx);
867 +
868 +struct rpivid_q_aux;
869 +#define RPIVID_AUX_ENT_COUNT VB2_MAX_FRAME
870 +
871 +struct rpivid_ctx {
872 + struct v4l2_fh fh;
873 + struct rpivid_dev *dev;
874 +
875 + struct v4l2_pix_format_mplane src_fmt;
876 + struct v4l2_pix_format_mplane dst_fmt;
877 + int dst_fmt_set;
878 +
879 + int src_stream_on;
880 + int dst_stream_on;
881 +
882 + // fatal_err is set if an error has occurred s.t. decode cannot
883 + // continue (such as running out of CMA)
884 + int fatal_err;
885 +
886 + /* Lock for queue operations */
887 + struct mutex ctx_mutex;
888 +
889 + struct v4l2_ctrl_handler hdl;
890 + struct v4l2_ctrl **ctrls;
891 +
892 + /* Decode state - stateless decoder my *** */
893 + /* state contains stuff that is only needed in phase0
894 + * it could be held in dec_env but that would be wasteful
895 + */
896 + struct rpivid_dec_state *state;
897 + struct rpivid_dec_env *dec0;
898 +
899 + /* Spinlock protecting dec_free */
900 + spinlock_t dec_lock;
901 + struct rpivid_dec_env *dec_free;
902 +
903 + struct rpivid_dec_env *dec_pool;
904 +
905 + unsigned int p1idx;
906 + atomic_t p1out;
907 + struct rpivid_gptr bitbufs[RPIVID_P1BUF_COUNT];
908 +
909 + /* *** Should be in dev *** */
910 + unsigned int p2idx;
911 + struct rpivid_gptr pu_bufs[RPIVID_P2BUF_COUNT];
912 + struct rpivid_gptr coeff_bufs[RPIVID_P2BUF_COUNT];
913 +
914 + /* Spinlock protecting aux_free */
915 + spinlock_t aux_lock;
916 + struct rpivid_q_aux *aux_free;
917 +
918 + struct rpivid_q_aux *aux_ents[RPIVID_AUX_ENT_COUNT];
919 +
920 + unsigned int colmv_stride;
921 + unsigned int colmv_picsize;
922 +};
923 +
924 +struct rpivid_dec_ops {
925 + void (*setup)(struct rpivid_ctx *ctx, struct rpivid_run *run);
926 + int (*start)(struct rpivid_ctx *ctx);
927 + void (*stop)(struct rpivid_ctx *ctx);
928 + void (*trigger)(struct rpivid_ctx *ctx);
929 +};
930 +
931 +struct rpivid_variant {
932 + unsigned int capabilities;
933 + unsigned int quirks;
934 + unsigned int mod_rate;
935 +};
936 +
937 +struct rpivid_hw_irq_ent;
938 +
939 +#define RPIVID_ICTL_ENABLE_UNLIMITED (-1)
940 +
941 +struct rpivid_hw_irq_ctrl {
942 + /* Spinlock protecting claim and tail */
943 + spinlock_t lock;
944 + struct rpivid_hw_irq_ent *claim;
945 + struct rpivid_hw_irq_ent *tail;
946 +
947 + /* Ent for pending irq - also prevents sched */
948 + struct rpivid_hw_irq_ent *irq;
949 + /* Non-zero => do not start a new job - outer layer sched pending */
950 + int no_sched;
951 + /* Enable count. -1 always OK, 0 do not sched, +ve shed & count down */
952 + int enable;
953 + /* Thread CB requested */
954 + bool thread_reqed;
955 +};
956 +
957 +struct rpivid_dev {
958 + struct v4l2_device v4l2_dev;
959 + struct video_device vfd;
960 + struct media_device mdev;
961 + struct media_pad pad[2];
962 + struct platform_device *pdev;
963 + struct device *dev;
964 + struct v4l2_m2m_dev *m2m_dev;
965 + const struct rpivid_dec_ops *dec_ops;
966 +
967 + /* Device file mutex */
968 + struct mutex dev_mutex;
969 +
970 + void __iomem *base_irq;
971 + void __iomem *base_h265;
972 +
973 + struct clk *clock;
974 + unsigned long max_clock_rate;
975 +
976 + int cache_align;
977 +
978 + struct rpivid_hw_irq_ctrl ic_active1;
979 + struct rpivid_hw_irq_ctrl ic_active2;
980 +};
981 +
982 +extern const struct rpivid_dec_ops rpivid_dec_ops_h265;
983 +extern const struct v4l2_ctrl_ops rpivid_hevc_sps_ctrl_ops;
984 +extern const struct v4l2_ctrl_ops rpivid_hevc_pps_ctrl_ops;
985 +
986 +struct v4l2_ctrl *rpivid_find_ctrl(struct rpivid_ctx *ctx, u32 id);
987 +void *rpivid_find_control_data(struct rpivid_ctx *ctx, u32 id);
988 +
989 +#endif
990 --- /dev/null
991 +++ b/drivers/staging/media/rpivid/rpivid_dec.c
992 @@ -0,0 +1,96 @@
993 +// SPDX-License-Identifier: GPL-2.0
994 +/*
995 + * Raspberry Pi HEVC driver
996 + *
997 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
998 + *
999 + * Based on the Cedrus VPU driver, that is:
1000 + *
1001 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
1002 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
1003 + * Copyright (C) 2018 Bootlin
1004 + */
1005 +
1006 +#include <media/v4l2-device.h>
1007 +#include <media/v4l2-ioctl.h>
1008 +#include <media/v4l2-event.h>
1009 +#include <media/v4l2-mem2mem.h>
1010 +
1011 +#include "rpivid.h"
1012 +#include "rpivid_dec.h"
1013 +
1014 +void rpivid_device_run(void *priv)
1015 +{
1016 + struct rpivid_ctx *const ctx = priv;
1017 + struct rpivid_dev *const dev = ctx->dev;
1018 + struct rpivid_run run = {};
1019 + struct media_request *src_req;
1020 +
1021 + run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
1022 + run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
1023 +
1024 + if (!run.src || !run.dst) {
1025 + v4l2_err(&dev->v4l2_dev, "%s: Missing buffer: src=%p, dst=%p\n",
1026 + __func__, run.src, run.dst);
1027 + goto fail;
1028 + }
1029 +
1030 + /* Apply request(s) controls */
1031 + src_req = run.src->vb2_buf.req_obj.req;
1032 + if (!src_req) {
1033 + v4l2_err(&dev->v4l2_dev, "%s: Missing request\n", __func__);
1034 + goto fail;
1035 + }
1036 +
1037 + v4l2_ctrl_request_setup(src_req, &ctx->hdl);
1038 +
1039 + switch (ctx->src_fmt.pixelformat) {
1040 + case V4L2_PIX_FMT_HEVC_SLICE:
1041 + {
1042 + const struct v4l2_ctrl *ctrl;
1043 +
1044 + run.h265.sps =
1045 + rpivid_find_control_data(ctx,
1046 + V4L2_CID_STATELESS_HEVC_SPS);
1047 + run.h265.pps =
1048 + rpivid_find_control_data(ctx,
1049 + V4L2_CID_STATELESS_HEVC_PPS);
1050 + run.h265.dec =
1051 + rpivid_find_control_data(ctx,
1052 + V4L2_CID_STATELESS_HEVC_DECODE_PARAMS);
1053 +
1054 + ctrl = rpivid_find_ctrl(ctx,
1055 + V4L2_CID_STATELESS_HEVC_SLICE_PARAMS);
1056 + if (!ctrl || !ctrl->elems) {
1057 + v4l2_err(&dev->v4l2_dev, "%s: Missing slice params\n",
1058 + __func__);
1059 + goto fail;
1060 + }
1061 + run.h265.slice_ents = ctrl->elems;
1062 + run.h265.slice_params = ctrl->p_cur.p;
1063 +
1064 + run.h265.scaling_matrix =
1065 + rpivid_find_control_data(ctx,
1066 + V4L2_CID_STATELESS_HEVC_SCALING_MATRIX);
1067 + break;
1068 + }
1069 +
1070 + default:
1071 + break;
1072 + }
1073 +
1074 + v4l2_m2m_buf_copy_metadata(run.src, run.dst, true);
1075 +
1076 + dev->dec_ops->setup(ctx, &run);
1077 +
1078 + /* Complete request(s) controls */
1079 + v4l2_ctrl_request_complete(src_req, &ctx->hdl);
1080 +
1081 + dev->dec_ops->trigger(ctx);
1082 + return;
1083 +
1084 +fail:
1085 + /* We really shouldn't get here but tidy up what we can */
1086 + v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx,
1087 + VB2_BUF_STATE_ERROR);
1088 +}
1089 --- /dev/null
1090 +++ b/drivers/staging/media/rpivid/rpivid_dec.h
1091 @@ -0,0 +1,19 @@
1092 +/* SPDX-License-Identifier: GPL-2.0 */
1093 +/*
1094 + * Raspberry Pi HEVC driver
1095 + *
1096 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
1097 + *
1098 + * Based on the Cedrus VPU driver, that is:
1099 + *
1100 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
1101 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
1102 + * Copyright (C) 2018 Bootlin
1103 + */
1104 +
1105 +#ifndef _RPIVID_DEC_H_
1106 +#define _RPIVID_DEC_H_
1107 +
1108 +void rpivid_device_run(void *priv);
1109 +
1110 +#endif
1111 --- /dev/null
1112 +++ b/drivers/staging/media/rpivid/rpivid_h265.c
1113 @@ -0,0 +1,2698 @@
1114 +// SPDX-License-Identifier: GPL-2.0-or-later
1115 +/*
1116 + * Raspberry Pi HEVC driver
1117 + *
1118 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
1119 + *
1120 + * Based on the Cedrus VPU driver, that is:
1121 + *
1122 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
1123 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
1124 + * Copyright (C) 2018 Bootlin
1125 + */
1126 +
1127 +#include <linux/delay.h>
1128 +#include <linux/types.h>
1129 +
1130 +#include <media/videobuf2-dma-contig.h>
1131 +
1132 +#include "rpivid.h"
1133 +#include "rpivid_hw.h"
1134 +#include "rpivid_video.h"
1135 +
1136 +#define DEBUG_TRACE_P1_CMD 0
1137 +#define DEBUG_TRACE_EXECUTION 0
1138 +
1139 +#define USE_REQUEST_PIN 1
1140 +
1141 +#if DEBUG_TRACE_EXECUTION
1142 +#define xtrace_in(dev_, de_)\
1143 + v4l2_info(&(dev_)->v4l2_dev, "%s[%d]: in\n", __func__,\
1144 + (de_) == NULL ? -1 : (de_)->decode_order)
1145 +#define xtrace_ok(dev_, de_)\
1146 + v4l2_info(&(dev_)->v4l2_dev, "%s[%d]: ok\n", __func__,\
1147 + (de_) == NULL ? -1 : (de_)->decode_order)
1148 +#define xtrace_fin(dev_, de_)\
1149 + v4l2_info(&(dev_)->v4l2_dev, "%s[%d]: finish\n", __func__,\
1150 + (de_) == NULL ? -1 : (de_)->decode_order)
1151 +#define xtrace_fail(dev_, de_)\
1152 + v4l2_info(&(dev_)->v4l2_dev, "%s[%d]: FAIL\n", __func__,\
1153 + (de_) == NULL ? -1 : (de_)->decode_order)
1154 +#else
1155 +#define xtrace_in(dev_, de_)
1156 +#define xtrace_ok(dev_, de_)
1157 +#define xtrace_fin(dev_, de_)
1158 +#define xtrace_fail(dev_, de_)
1159 +#endif
1160 +
1161 +enum hevc_slice_type {
1162 + HEVC_SLICE_B = 0,
1163 + HEVC_SLICE_P = 1,
1164 + HEVC_SLICE_I = 2,
1165 +};
1166 +
1167 +enum hevc_layer { L0 = 0, L1 = 1 };
1168 +
1169 +static int gptr_alloc(struct rpivid_dev *const dev, struct rpivid_gptr *gptr,
1170 + size_t size, unsigned long attrs)
1171 +{
1172 + gptr->size = size;
1173 + gptr->attrs = attrs;
1174 + gptr->addr = 0;
1175 + gptr->ptr = dma_alloc_attrs(dev->dev, gptr->size, &gptr->addr,
1176 + GFP_KERNEL, gptr->attrs);
1177 + return !gptr->ptr ? -ENOMEM : 0;
1178 +}
1179 +
1180 +static void gptr_free(struct rpivid_dev *const dev,
1181 + struct rpivid_gptr *const gptr)
1182 +{
1183 + if (gptr->ptr)
1184 + dma_free_attrs(dev->dev, gptr->size, gptr->ptr, gptr->addr,
1185 + gptr->attrs);
1186 + gptr->size = 0;
1187 + gptr->ptr = NULL;
1188 + gptr->addr = 0;
1189 + gptr->attrs = 0;
1190 +}
1191 +
1192 +/* Realloc but do not copy
1193 + *
1194 + * Frees then allocs.
1195 + * If the alloc fails then it attempts to re-allocote the old size
1196 + * On error then check gptr->ptr to determine if anything is currently
1197 + * allocated.
1198 + */
1199 +static int gptr_realloc_new(struct rpivid_dev * const dev,
1200 + struct rpivid_gptr * const gptr, size_t size)
1201 +{
1202 + const size_t old_size = gptr->size;
1203 +
1204 + if (size == gptr->size)
1205 + return 0;
1206 +
1207 + if (gptr->ptr)
1208 + dma_free_attrs(dev->dev, gptr->size, gptr->ptr,
1209 + gptr->addr, gptr->attrs);
1210 +
1211 + gptr->addr = 0;
1212 + gptr->size = size;
1213 + gptr->ptr = dma_alloc_attrs(dev->dev, gptr->size,
1214 + &gptr->addr, GFP_KERNEL, gptr->attrs);
1215 +
1216 + if (!gptr->ptr) {
1217 + gptr->addr = 0;
1218 + gptr->size = old_size;
1219 + gptr->ptr = dma_alloc_attrs(dev->dev, gptr->size,
1220 + &gptr->addr, GFP_KERNEL, gptr->attrs);
1221 + if (!gptr->ptr) {
1222 + gptr->size = 0;
1223 + gptr->addr = 0;
1224 + gptr->attrs = 0;
1225 + }
1226 + return -ENOMEM;
1227 + }
1228 +
1229 + return 0;
1230 +}
1231 +
1232 +static size_t next_size(const size_t x)
1233 +{
1234 + return rpivid_round_up_size(x + 1);
1235 +}
1236 +
1237 +#define NUM_SCALING_FACTORS 4064 /* Not a typo = 0xbe0 + 0x400 */
1238 +
1239 +#define AXI_BASE64 0
1240 +
1241 +#define PROB_BACKUP ((20 << 12) + (20 << 6) + (0 << 0))
1242 +#define PROB_RELOAD ((20 << 12) + (20 << 0) + (0 << 6))
1243 +
1244 +#define HEVC_MAX_REFS V4L2_HEVC_DPB_ENTRIES_NUM_MAX
1245 +
1246 +//////////////////////////////////////////////////////////////////////////////
1247 +
1248 +struct rpi_cmd {
1249 + u32 addr;
1250 + u32 data;
1251 +} __packed;
1252 +
1253 +struct rpivid_q_aux {
1254 + unsigned int refcount;
1255 + unsigned int q_index;
1256 + struct rpivid_q_aux *next;
1257 + struct rpivid_gptr col;
1258 +};
1259 +
1260 +//////////////////////////////////////////////////////////////////////////////
1261 +
1262 +enum rpivid_decode_state {
1263 + RPIVID_DECODE_SLICE_START,
1264 + RPIVID_DECODE_SLICE_CONTINUE,
1265 + RPIVID_DECODE_ERROR_CONTINUE,
1266 + RPIVID_DECODE_ERROR_DONE,
1267 + RPIVID_DECODE_PHASE1,
1268 + RPIVID_DECODE_END,
1269 +};
1270 +
1271 +struct rpivid_dec_env {
1272 + struct rpivid_ctx *ctx;
1273 + struct rpivid_dec_env *next;
1274 +
1275 + enum rpivid_decode_state state;
1276 + unsigned int decode_order;
1277 + int p1_status; /* P1 status - what to realloc */
1278 +
1279 + struct rpi_cmd *cmd_fifo;
1280 + unsigned int cmd_len, cmd_max;
1281 + unsigned int num_slice_msgs;
1282 + unsigned int pic_width_in_ctbs_y;
1283 + unsigned int pic_height_in_ctbs_y;
1284 + unsigned int dpbno_col;
1285 + u32 reg_slicestart;
1286 + int collocated_from_l0_flag;
1287 + /*
1288 + * Last CTB/Tile X,Y processed by (wpp_)entry_point
1289 + * Could be in _state as P0 only but needs updating where _state
1290 + * is const
1291 + */
1292 + unsigned int entry_ctb_x;
1293 + unsigned int entry_ctb_y;
1294 + unsigned int entry_tile_x;
1295 + unsigned int entry_tile_y;
1296 + unsigned int entry_qp;
1297 + u32 entry_slice;
1298 +
1299 + u32 rpi_config2;
1300 + u32 rpi_framesize;
1301 + u32 rpi_currpoc;
1302 +
1303 + struct vb2_v4l2_buffer *frame_buf; // Detached dest buffer
1304 + struct vb2_v4l2_buffer *src_buf; // Detached src buffer
1305 + unsigned int frame_c_offset;
1306 + unsigned int frame_stride;
1307 + dma_addr_t frame_addr;
1308 + dma_addr_t ref_addrs[16];
1309 + struct rpivid_q_aux *frame_aux;
1310 + struct rpivid_q_aux *col_aux;
1311 +
1312 + dma_addr_t cmd_addr;
1313 + size_t cmd_size;
1314 +
1315 + dma_addr_t pu_base_vc;
1316 + dma_addr_t coeff_base_vc;
1317 + u32 pu_stride;
1318 + u32 coeff_stride;
1319 +
1320 + struct rpivid_gptr *bit_copy_gptr;
1321 + size_t bit_copy_len;
1322 +
1323 +#define SLICE_MSGS_MAX (2 * HEVC_MAX_REFS * 8 + 3)
1324 + u16 slice_msgs[SLICE_MSGS_MAX];
1325 + u8 scaling_factors[NUM_SCALING_FACTORS];
1326 +
1327 +#if USE_REQUEST_PIN
1328 + struct media_request *req_pin;
1329 +#else
1330 + struct media_request_object *req_obj;
1331 +#endif
1332 + struct rpivid_hw_irq_ent irq_ent;
1333 +};
1334 +
1335 +#define member_size(type, member) sizeof(((type *)0)->member)
1336 +
1337 +struct rpivid_dec_state {
1338 + struct v4l2_ctrl_hevc_sps sps;
1339 + struct v4l2_ctrl_hevc_pps pps;
1340 +
1341 + // Helper vars & tables derived from sps/pps
1342 + unsigned int log2_ctb_size; /* log2 width of a CTB */
1343 + unsigned int ctb_width; /* Width in CTBs */
1344 + unsigned int ctb_height; /* Height in CTBs */
1345 + unsigned int ctb_size; /* Pic area in CTBs */
1346 + unsigned int tile_width; /* Width in tiles */
1347 + unsigned int tile_height; /* Height in tiles */
1348 +
1349 + int *col_bd;
1350 + int *row_bd;
1351 + int *ctb_addr_rs_to_ts;
1352 + int *ctb_addr_ts_to_rs;
1353 +
1354 + // Aux starage for DPB
1355 + // Hold refs
1356 + struct rpivid_q_aux *ref_aux[HEVC_MAX_REFS];
1357 + struct rpivid_q_aux *frame_aux;
1358 +
1359 + // Slice vars
1360 + unsigned int slice_idx;
1361 + bool slice_temporal_mvp; /* Slice flag but constant for frame */
1362 + bool use_aux;
1363 + bool mk_aux;
1364 +
1365 + // Temp vars per run - don't actually need to persist
1366 + u8 *src_buf;
1367 + dma_addr_t src_addr;
1368 + const struct v4l2_ctrl_hevc_slice_params *sh;
1369 + const struct v4l2_ctrl_hevc_decode_params *dec;
1370 + unsigned int nb_refs[2];
1371 + unsigned int slice_qp;
1372 + unsigned int max_num_merge_cand; // 0 if I-slice
1373 + bool dependent_slice_segment_flag;
1374 +
1375 + unsigned int start_ts; /* slice_segment_addr -> ts */
1376 + unsigned int start_ctb_x; /* CTB X,Y of start_ts */
1377 + unsigned int start_ctb_y;
1378 + unsigned int prev_ctb_x; /* CTB X,Y of start_ts - 1 */
1379 + unsigned int prev_ctb_y;
1380 +};
1381 +
1382 +#if !USE_REQUEST_PIN
1383 +static void dst_req_obj_release(struct media_request_object *object)
1384 +{
1385 + kfree(object);
1386 +}
1387 +
1388 +static const struct media_request_object_ops dst_req_obj_ops = {
1389 + .release = dst_req_obj_release,
1390 +};
1391 +#endif
1392 +
1393 +static inline int clip_int(const int x, const int lo, const int hi)
1394 +{
1395 + return x < lo ? lo : x > hi ? hi : x;
1396 +}
1397 +
1398 +//////////////////////////////////////////////////////////////////////////////
1399 +// Phase 1 command and bit FIFOs
1400 +
1401 +#if DEBUG_TRACE_P1_CMD
1402 +static int p1_z;
1403 +#endif
1404 +
1405 +static int cmds_check_space(struct rpivid_dec_env *const de, unsigned int n)
1406 +{
1407 + struct rpi_cmd *a;
1408 + unsigned int newmax;
1409 +
1410 + if (n > 0x100000) {
1411 + v4l2_err(&de->ctx->dev->v4l2_dev,
1412 + "%s: n %u implausible\n", __func__, n);
1413 + return -ENOMEM;
1414 + }
1415 +
1416 + if (de->cmd_len + n <= de->cmd_max)
1417 + return 0;
1418 +
1419 + newmax = roundup_pow_of_two(de->cmd_len + n);
1420 +
1421 + a = krealloc(de->cmd_fifo, newmax * sizeof(struct rpi_cmd),
1422 + GFP_KERNEL);
1423 + if (!a) {
1424 + v4l2_err(&de->ctx->dev->v4l2_dev,
1425 + "Failed cmd buffer realloc from %u to %u\n",
1426 + de->cmd_max, newmax);
1427 + return -ENOMEM;
1428 + }
1429 + v4l2_info(&de->ctx->dev->v4l2_dev,
1430 + "cmd buffer realloc from %u to %u\n", de->cmd_max, newmax);
1431 +
1432 + de->cmd_fifo = a;
1433 + de->cmd_max = newmax;
1434 + return 0;
1435 +}
1436 +
1437 +// ???? u16 addr - put in u32
1438 +static void p1_apb_write(struct rpivid_dec_env *const de, const u16 addr,
1439 + const u32 data)
1440 +{
1441 + if (de->cmd_len >= de->cmd_max) {
1442 + v4l2_err(&de->ctx->dev->v4l2_dev,
1443 + "%s: Overflow @ %d\n", __func__, de->cmd_len);
1444 + return;
1445 + }
1446 +
1447 + de->cmd_fifo[de->cmd_len].addr = addr;
1448 + de->cmd_fifo[de->cmd_len].data = data;
1449 +
1450 +#if DEBUG_TRACE_P1_CMD
1451 + if (++p1_z < 256) {
1452 + v4l2_info(&de->ctx->dev->v4l2_dev, "[%02x] %x %x\n",
1453 + de->cmd_len, addr, data);
1454 + }
1455 +#endif
1456 + de->cmd_len++;
1457 +}
1458 +
1459 +static int ctb_to_tile(unsigned int ctb, unsigned int *bd, int num)
1460 +{
1461 + int i;
1462 +
1463 + for (i = 1; ctb >= bd[i]; i++)
1464 + ; // bd[] has num+1 elements; bd[0]=0;
1465 + return i - 1;
1466 +}
1467 +
1468 +static unsigned int ctb_to_tile_x(const struct rpivid_dec_state *const s,
1469 + const unsigned int ctb_x)
1470 +{
1471 + return ctb_to_tile(ctb_x, s->col_bd, s->tile_width);
1472 +}
1473 +
1474 +static unsigned int ctb_to_tile_y(const struct rpivid_dec_state *const s,
1475 + const unsigned int ctb_y)
1476 +{
1477 + return ctb_to_tile(ctb_y, s->row_bd, s->tile_height);
1478 +}
1479 +
1480 +static void aux_q_free(struct rpivid_ctx *const ctx,
1481 + struct rpivid_q_aux *const aq)
1482 +{
1483 + struct rpivid_dev *const dev = ctx->dev;
1484 +
1485 + gptr_free(dev, &aq->col);
1486 + kfree(aq);
1487 +}
1488 +
1489 +static struct rpivid_q_aux *aux_q_alloc(struct rpivid_ctx *const ctx,
1490 + const unsigned int q_index)
1491 +{
1492 + struct rpivid_dev *const dev = ctx->dev;
1493 + struct rpivid_q_aux *const aq = kzalloc(sizeof(*aq), GFP_KERNEL);
1494 +
1495 + if (!aq)
1496 + return NULL;
1497 +
1498 + if (gptr_alloc(dev, &aq->col, ctx->colmv_picsize,
1499 + DMA_ATTR_FORCE_CONTIGUOUS | DMA_ATTR_NO_KERNEL_MAPPING))
1500 + goto fail;
1501 +
1502 + /*
1503 + * Spinlock not required as called in P0 only and
1504 + * aux checks done by _new
1505 + */
1506 + aq->refcount = 1;
1507 + aq->q_index = q_index;
1508 + ctx->aux_ents[q_index] = aq;
1509 + return aq;
1510 +
1511 +fail:
1512 + kfree(aq);
1513 + return NULL;
1514 +}
1515 +
1516 +static struct rpivid_q_aux *aux_q_new(struct rpivid_ctx *const ctx,
1517 + const unsigned int q_index)
1518 +{
1519 + struct rpivid_q_aux *aq;
1520 + unsigned long lockflags;
1521 +
1522 + spin_lock_irqsave(&ctx->aux_lock, lockflags);
1523 + /*
1524 + * If we already have this allocated to a slot then use that
1525 + * and assume that it will all work itself out in the pipeline
1526 + */
1527 + if ((aq = ctx->aux_ents[q_index]) != NULL) {
1528 + ++aq->refcount;
1529 + } else if ((aq = ctx->aux_free) != NULL) {
1530 + ctx->aux_free = aq->next;
1531 + aq->next = NULL;
1532 + aq->refcount = 1;
1533 + aq->q_index = q_index;
1534 + ctx->aux_ents[q_index] = aq;
1535 + }
1536 + spin_unlock_irqrestore(&ctx->aux_lock, lockflags);
1537 +
1538 + if (!aq)
1539 + aq = aux_q_alloc(ctx, q_index);
1540 +
1541 + return aq;
1542 +}
1543 +
1544 +static struct rpivid_q_aux *aux_q_ref_idx(struct rpivid_ctx *const ctx,
1545 + const int q_index)
1546 +{
1547 + unsigned long lockflags;
1548 + struct rpivid_q_aux *aq;
1549 +
1550 + spin_lock_irqsave(&ctx->aux_lock, lockflags);
1551 + if ((aq = ctx->aux_ents[q_index]) != NULL)
1552 + ++aq->refcount;
1553 + spin_unlock_irqrestore(&ctx->aux_lock, lockflags);
1554 +
1555 + return aq;
1556 +}
1557 +
1558 +static struct rpivid_q_aux *aux_q_ref(struct rpivid_ctx *const ctx,
1559 + struct rpivid_q_aux *const aq)
1560 +{
1561 + if (aq) {
1562 + unsigned long lockflags;
1563 +
1564 + spin_lock_irqsave(&ctx->aux_lock, lockflags);
1565 +
1566 + ++aq->refcount;
1567 +
1568 + spin_unlock_irqrestore(&ctx->aux_lock, lockflags);
1569 + }
1570 + return aq;
1571 +}
1572 +
1573 +static void aux_q_release(struct rpivid_ctx *const ctx,
1574 + struct rpivid_q_aux **const paq)
1575 +{
1576 + struct rpivid_q_aux *const aq = *paq;
1577 + unsigned long lockflags;
1578 +
1579 + if (!aq)
1580 + return;
1581 +
1582 + *paq = NULL;
1583 +
1584 + spin_lock_irqsave(&ctx->aux_lock, lockflags);
1585 + if (--aq->refcount == 0) {
1586 + aq->next = ctx->aux_free;
1587 + ctx->aux_free = aq;
1588 + ctx->aux_ents[aq->q_index] = NULL;
1589 + aq->q_index = ~0U;
1590 + }
1591 + spin_unlock_irqrestore(&ctx->aux_lock, lockflags);
1592 +}
1593 +
1594 +static void aux_q_init(struct rpivid_ctx *const ctx)
1595 +{
1596 + spin_lock_init(&ctx->aux_lock);
1597 + ctx->aux_free = NULL;
1598 +}
1599 +
1600 +static void aux_q_uninit(struct rpivid_ctx *const ctx)
1601 +{
1602 + struct rpivid_q_aux *aq;
1603 +
1604 + ctx->colmv_picsize = 0;
1605 + ctx->colmv_stride = 0;
1606 + while ((aq = ctx->aux_free) != NULL) {
1607 + ctx->aux_free = aq->next;
1608 + aux_q_free(ctx, aq);
1609 + }
1610 +}
1611 +
1612 +//////////////////////////////////////////////////////////////////////////////
1613 +
1614 +/*
1615 + * Initialisation process for context variables (CABAC init)
1616 + * see H.265 9.3.2.2
1617 + *
1618 + * N.B. If comparing with FFmpeg note that this h/w uses slightly different
1619 + * offsets to FFmpegs array
1620 + */
1621 +
1622 +/* Actual number of values */
1623 +#define RPI_PROB_VALS 154U
1624 +/* Rounded up as we copy words */
1625 +#define RPI_PROB_ARRAY_SIZE ((154 + 3) & ~3)
1626 +
1627 +/* Initialiser values - see tables H.265 9-4 through 9-42 */
1628 +static const u8 prob_init[3][156] = {
1629 + {
1630 + 153, 200, 139, 141, 157, 154, 154, 154, 154, 154, 184, 154, 154,
1631 + 154, 184, 63, 154, 154, 154, 154, 154, 154, 154, 154, 154, 154,
1632 + 154, 154, 154, 153, 138, 138, 111, 141, 94, 138, 182, 154, 154,
1633 + 154, 140, 92, 137, 138, 140, 152, 138, 139, 153, 74, 149, 92,
1634 + 139, 107, 122, 152, 140, 179, 166, 182, 140, 227, 122, 197, 110,
1635 + 110, 124, 125, 140, 153, 125, 127, 140, 109, 111, 143, 127, 111,
1636 + 79, 108, 123, 63, 110, 110, 124, 125, 140, 153, 125, 127, 140,
1637 + 109, 111, 143, 127, 111, 79, 108, 123, 63, 91, 171, 134, 141,
1638 + 138, 153, 136, 167, 152, 152, 139, 139, 111, 111, 125, 110, 110,
1639 + 94, 124, 108, 124, 107, 125, 141, 179, 153, 125, 107, 125, 141,
1640 + 179, 153, 125, 107, 125, 141, 179, 153, 125, 140, 139, 182, 182,
1641 + 152, 136, 152, 136, 153, 136, 139, 111, 136, 139, 111, 0, 0,
1642 + },
1643 + {
1644 + 153, 185, 107, 139, 126, 197, 185, 201, 154, 149, 154, 139, 154,
1645 + 154, 154, 152, 110, 122, 95, 79, 63, 31, 31, 153, 153, 168,
1646 + 140, 198, 79, 124, 138, 94, 153, 111, 149, 107, 167, 154, 154,
1647 + 154, 154, 196, 196, 167, 154, 152, 167, 182, 182, 134, 149, 136,
1648 + 153, 121, 136, 137, 169, 194, 166, 167, 154, 167, 137, 182, 125,
1649 + 110, 94, 110, 95, 79, 125, 111, 110, 78, 110, 111, 111, 95,
1650 + 94, 108, 123, 108, 125, 110, 94, 110, 95, 79, 125, 111, 110,
1651 + 78, 110, 111, 111, 95, 94, 108, 123, 108, 121, 140, 61, 154,
1652 + 107, 167, 91, 122, 107, 167, 139, 139, 155, 154, 139, 153, 139,
1653 + 123, 123, 63, 153, 166, 183, 140, 136, 153, 154, 166, 183, 140,
1654 + 136, 153, 154, 166, 183, 140, 136, 153, 154, 170, 153, 123, 123,
1655 + 107, 121, 107, 121, 167, 151, 183, 140, 151, 183, 140, 0, 0,
1656 + },
1657 + {
1658 + 153, 160, 107, 139, 126, 197, 185, 201, 154, 134, 154, 139, 154,
1659 + 154, 183, 152, 154, 137, 95, 79, 63, 31, 31, 153, 153, 168,
1660 + 169, 198, 79, 224, 167, 122, 153, 111, 149, 92, 167, 154, 154,
1661 + 154, 154, 196, 167, 167, 154, 152, 167, 182, 182, 134, 149, 136,
1662 + 153, 121, 136, 122, 169, 208, 166, 167, 154, 152, 167, 182, 125,
1663 + 110, 124, 110, 95, 94, 125, 111, 111, 79, 125, 126, 111, 111,
1664 + 79, 108, 123, 93, 125, 110, 124, 110, 95, 94, 125, 111, 111,
1665 + 79, 125, 126, 111, 111, 79, 108, 123, 93, 121, 140, 61, 154,
1666 + 107, 167, 91, 107, 107, 167, 139, 139, 170, 154, 139, 153, 139,
1667 + 123, 123, 63, 124, 166, 183, 140, 136, 153, 154, 166, 183, 140,
1668 + 136, 153, 154, 166, 183, 140, 136, 153, 154, 170, 153, 138, 138,
1669 + 122, 121, 122, 121, 167, 151, 183, 140, 151, 183, 140, 0, 0,
1670 + },
1671 +};
1672 +
1673 +#define CMDS_WRITE_PROB ((RPI_PROB_ARRAY_SIZE / 4) + 1)
1674 +static void write_prob(struct rpivid_dec_env *const de,
1675 + const struct rpivid_dec_state *const s)
1676 +{
1677 + u8 dst[RPI_PROB_ARRAY_SIZE];
1678 +
1679 + const unsigned int init_type =
1680 + ((s->sh->flags & V4L2_HEVC_SLICE_PARAMS_FLAG_CABAC_INIT) != 0 &&
1681 + s->sh->slice_type != HEVC_SLICE_I) ?
1682 + s->sh->slice_type + 1 :
1683 + 2 - s->sh->slice_type;
1684 + const u8 *p = prob_init[init_type];
1685 + const int q = clip_int(s->slice_qp, 0, 51);
1686 + unsigned int i;
1687 +
1688 + for (i = 0; i < RPI_PROB_VALS; i++) {
1689 + int init_value = p[i];
1690 + int m = (init_value >> 4) * 5 - 45;
1691 + int n = ((init_value & 15) << 3) - 16;
1692 + int pre = 2 * (((m * q) >> 4) + n) - 127;
1693 +
1694 + pre ^= pre >> 31;
1695 + if (pre > 124)
1696 + pre = 124 + (pre & 1);
1697 + dst[i] = pre;
1698 + }
1699 + for (i = RPI_PROB_VALS; i != RPI_PROB_ARRAY_SIZE; ++i)
1700 + dst[i] = 0;
1701 +
1702 + for (i = 0; i < RPI_PROB_ARRAY_SIZE; i += 4)
1703 + p1_apb_write(de, 0x1000 + i,
1704 + dst[i] + (dst[i + 1] << 8) + (dst[i + 2] << 16) +
1705 + (dst[i + 3] << 24));
1706 +
1707 + /*
1708 + * Having written the prob array back it up
1709 + * This is not always needed but is a small overhead that simplifies
1710 + * (and speeds up) some multi-tile & WPP scenarios
1711 + * There are no scenarios where having written a prob we ever want
1712 + * a previous (non-initial) state back
1713 + */
1714 + p1_apb_write(de, RPI_TRANSFER, PROB_BACKUP);
1715 +}
1716 +
1717 +#define CMDS_WRITE_SCALING_FACTORS NUM_SCALING_FACTORS
1718 +static void write_scaling_factors(struct rpivid_dec_env *const de)
1719 +{
1720 + int i;
1721 + const u8 *p = (u8 *)de->scaling_factors;
1722 +
1723 + for (i = 0; i < NUM_SCALING_FACTORS; i += 4, p += 4)
1724 + p1_apb_write(de, 0x2000 + i,
1725 + p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24));
1726 +}
1727 +
1728 +static inline __u32 dma_to_axi_addr(dma_addr_t a)
1729 +{
1730 + return (__u32)(a >> 6);
1731 +}
1732 +
1733 +#define CMDS_WRITE_BITSTREAM 4
1734 +static int write_bitstream(struct rpivid_dec_env *const de,
1735 + const struct rpivid_dec_state *const s)
1736 +{
1737 + // Note that FFmpeg V4L2 does not remove emulation prevention bytes,
1738 + // so this is matched in the configuration here.
1739 + // Whether that is the correct behaviour or not is not clear in the
1740 + // spec.
1741 + const int rpi_use_emu = 1;
1742 + unsigned int offset = s->sh->data_byte_offset;
1743 + const unsigned int len = (s->sh->bit_size + 7) / 8 - offset;
1744 + dma_addr_t addr;
1745 +
1746 + if (s->src_addr != 0) {
1747 + addr = s->src_addr + offset;
1748 + } else {
1749 + if (len + de->bit_copy_len > de->bit_copy_gptr->size) {
1750 + v4l2_warn(&de->ctx->dev->v4l2_dev,
1751 + "Bit copy buffer overflow: size=%zu, offset=%zu, len=%u\n",
1752 + de->bit_copy_gptr->size,
1753 + de->bit_copy_len, len);
1754 + return -ENOMEM;
1755 + }
1756 + memcpy(de->bit_copy_gptr->ptr + de->bit_copy_len,
1757 + s->src_buf + offset, len);
1758 + addr = de->bit_copy_gptr->addr + de->bit_copy_len;
1759 + de->bit_copy_len += (len + 63) & ~63;
1760 + }
1761 + offset = addr & 63;
1762 +
1763 + p1_apb_write(de, RPI_BFBASE, dma_to_axi_addr(addr));
1764 + p1_apb_write(de, RPI_BFNUM, len);
1765 + p1_apb_write(de, RPI_BFCONTROL, offset + (1 << 7)); // Stop
1766 + p1_apb_write(de, RPI_BFCONTROL, offset + (rpi_use_emu << 6));
1767 + return 0;
1768 +}
1769 +
1770 +//////////////////////////////////////////////////////////////////////////////
1771 +
1772 +/*
1773 + * The slice constant part of the slice register - width and height need to
1774 + * be ORed in later as they are per-tile / WPP-row
1775 + */
1776 +static u32 slice_reg_const(const struct rpivid_dec_state *const s)
1777 +{
1778 + u32 x = (s->max_num_merge_cand << 0) |
1779 + (s->nb_refs[L0] << 4) |
1780 + (s->nb_refs[L1] << 8) |
1781 + (s->sh->slice_type << 12);
1782 +
1783 + if (s->sh->flags & V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_SAO_LUMA)
1784 + x |= BIT(14);
1785 + if (s->sh->flags & V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_SAO_CHROMA)
1786 + x |= BIT(15);
1787 + if (s->sh->slice_type == HEVC_SLICE_B &&
1788 + (s->sh->flags & V4L2_HEVC_SLICE_PARAMS_FLAG_MVD_L1_ZERO))
1789 + x |= BIT(16);
1790 +
1791 + return x;
1792 +}
1793 +
1794 +//////////////////////////////////////////////////////////////////////////////
1795 +
1796 +#define CMDS_NEW_SLICE_SEGMENT (4 + CMDS_WRITE_SCALING_FACTORS)
1797 +static void new_slice_segment(struct rpivid_dec_env *const de,
1798 + const struct rpivid_dec_state *const s)
1799 +{
1800 + const struct v4l2_ctrl_hevc_sps *const sps = &s->sps;
1801 + const struct v4l2_ctrl_hevc_pps *const pps = &s->pps;
1802 +
1803 + p1_apb_write(de,
1804 + RPI_SPS0,
1805 + ((sps->log2_min_luma_coding_block_size_minus3 + 3) << 0) |
1806 + (s->log2_ctb_size << 4) |
1807 + ((sps->log2_min_luma_transform_block_size_minus2 + 2)
1808 + << 8) |
1809 + ((sps->log2_min_luma_transform_block_size_minus2 + 2 +
1810 + sps->log2_diff_max_min_luma_transform_block_size)
1811 + << 12) |
1812 + ((sps->bit_depth_luma_minus8 + 8) << 16) |
1813 + ((sps->bit_depth_chroma_minus8 + 8) << 20) |
1814 + (sps->max_transform_hierarchy_depth_intra << 24) |
1815 + (sps->max_transform_hierarchy_depth_inter << 28));
1816 +
1817 + p1_apb_write(de,
1818 + RPI_SPS1,
1819 + ((sps->pcm_sample_bit_depth_luma_minus1 + 1) << 0) |
1820 + ((sps->pcm_sample_bit_depth_chroma_minus1 + 1) << 4) |
1821 + ((sps->log2_min_pcm_luma_coding_block_size_minus3 + 3)
1822 + << 8) |
1823 + ((sps->log2_min_pcm_luma_coding_block_size_minus3 + 3 +
1824 + sps->log2_diff_max_min_pcm_luma_coding_block_size)
1825 + << 12) |
1826 + (((sps->flags & V4L2_HEVC_SPS_FLAG_SEPARATE_COLOUR_PLANE) ?
1827 + 0 : sps->chroma_format_idc) << 16) |
1828 + ((!!(sps->flags & V4L2_HEVC_SPS_FLAG_AMP_ENABLED)) << 18) |
1829 + ((!!(sps->flags & V4L2_HEVC_SPS_FLAG_PCM_ENABLED)) << 19) |
1830 + ((!!(sps->flags & V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED))
1831 + << 20) |
1832 + ((!!(sps->flags &
1833 + V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED))
1834 + << 21));
1835 +
1836 + p1_apb_write(de,
1837 + RPI_PPS,
1838 + ((s->log2_ctb_size - pps->diff_cu_qp_delta_depth) << 0) |
1839 + ((!!(pps->flags & V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED))
1840 + << 4) |
1841 + ((!!(pps->flags &
1842 + V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED))
1843 + << 5) |
1844 + ((!!(pps->flags & V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED))
1845 + << 6) |
1846 + ((!!(pps->flags &
1847 + V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED))
1848 + << 7) |
1849 + (((pps->pps_cb_qp_offset + s->sh->slice_cb_qp_offset) & 255)
1850 + << 8) |
1851 + (((pps->pps_cr_qp_offset + s->sh->slice_cr_qp_offset) & 255)
1852 + << 16) |
1853 + ((!!(pps->flags &
1854 + V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED))
1855 + << 24));
1856 +
1857 + if (!s->start_ts &&
1858 + (sps->flags & V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED) != 0)
1859 + write_scaling_factors(de);
1860 +
1861 + if (!s->dependent_slice_segment_flag) {
1862 + int ctb_col = s->sh->slice_segment_addr %
1863 + de->pic_width_in_ctbs_y;
1864 + int ctb_row = s->sh->slice_segment_addr /
1865 + de->pic_width_in_ctbs_y;
1866 +
1867 + de->reg_slicestart = (ctb_col << 0) + (ctb_row << 16);
1868 + }
1869 +
1870 + p1_apb_write(de, RPI_SLICESTART, de->reg_slicestart);
1871 +}
1872 +
1873 +//////////////////////////////////////////////////////////////////////////////
1874 +// Slice messages
1875 +
1876 +static void msg_slice(struct rpivid_dec_env *const de, const u16 msg)
1877 +{
1878 + de->slice_msgs[de->num_slice_msgs++] = msg;
1879 +}
1880 +
1881 +#define CMDS_PROGRAM_SLICECMDS (1 + SLICE_MSGS_MAX)
1882 +static void program_slicecmds(struct rpivid_dec_env *const de,
1883 + const int sliceid)
1884 +{
1885 + int i;
1886 +
1887 + p1_apb_write(de, RPI_SLICECMDS, de->num_slice_msgs + (sliceid << 8));
1888 +
1889 + for (i = 0; i < de->num_slice_msgs; i++)
1890 + p1_apb_write(de, 0x4000 + 4 * i, de->slice_msgs[i] & 0xffff);
1891 +}
1892 +
1893 +// NoBackwardPredictionFlag 8.3.5
1894 +// Simply checks POCs
1895 +static int has_backward(const struct v4l2_hevc_dpb_entry *const dpb,
1896 + const __u8 *const idx, const unsigned int n,
1897 + const s32 cur_poc)
1898 +{
1899 + unsigned int i;
1900 +
1901 + for (i = 0; i < n; ++i) {
1902 + if (cur_poc < dpb[idx[i]].pic_order_cnt_val)
1903 + return 0;
1904 + }
1905 + return 1;
1906 +}
1907 +
1908 +static void pre_slice_decode(struct rpivid_dec_env *const de,
1909 + const struct rpivid_dec_state *const s)
1910 +{
1911 + const struct v4l2_ctrl_hevc_slice_params *const sh = s->sh;
1912 + const struct v4l2_ctrl_hevc_decode_params *const dec = s->dec;
1913 + int weighted_pred_flag, idx;
1914 + u16 cmd_slice;
1915 + unsigned int collocated_from_l0_flag;
1916 +
1917 + de->num_slice_msgs = 0;
1918 +
1919 + cmd_slice = 0;
1920 + if (sh->slice_type == HEVC_SLICE_I)
1921 + cmd_slice = 1;
1922 + if (sh->slice_type == HEVC_SLICE_P)
1923 + cmd_slice = 2;
1924 + if (sh->slice_type == HEVC_SLICE_B)
1925 + cmd_slice = 3;
1926 +
1927 + cmd_slice |= (s->nb_refs[L0] << 2) | (s->nb_refs[L1] << 6) |
1928 + (s->max_num_merge_cand << 11);
1929 +
1930 + collocated_from_l0_flag =
1931 + !s->slice_temporal_mvp ||
1932 + sh->slice_type != HEVC_SLICE_B ||
1933 + (sh->flags & V4L2_HEVC_SLICE_PARAMS_FLAG_COLLOCATED_FROM_L0);
1934 + cmd_slice |= collocated_from_l0_flag << 14;
1935 +
1936 + if (sh->slice_type == HEVC_SLICE_P || sh->slice_type == HEVC_SLICE_B) {
1937 + // Flag to say all reference pictures are from the past
1938 + const int no_backward_pred_flag =
1939 + has_backward(dec->dpb, sh->ref_idx_l0, s->nb_refs[L0],
1940 + sh->slice_pic_order_cnt) &&
1941 + has_backward(dec->dpb, sh->ref_idx_l1, s->nb_refs[L1],
1942 + sh->slice_pic_order_cnt);
1943 + cmd_slice |= no_backward_pred_flag << 10;
1944 + msg_slice(de, cmd_slice);
1945 +
1946 + if (s->slice_temporal_mvp) {
1947 + const __u8 *const rpl = collocated_from_l0_flag ?
1948 + sh->ref_idx_l0 : sh->ref_idx_l1;
1949 + de->dpbno_col = rpl[sh->collocated_ref_idx];
1950 + //v4l2_info(&de->ctx->dev->v4l2_dev,
1951 + // "L0=%d col_ref_idx=%d,
1952 + // dpb_no=%d\n", collocated_from_l0_flag,
1953 + // sh->collocated_ref_idx, de->dpbno_col);
1954 + }
1955 +
1956 + // Write reference picture descriptions
1957 + weighted_pred_flag =
1958 + sh->slice_type == HEVC_SLICE_P ?
1959 + !!(s->pps.flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED) :
1960 + !!(s->pps.flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED);
1961 +
1962 + for (idx = 0; idx < s->nb_refs[L0]; ++idx) {
1963 + unsigned int dpb_no = sh->ref_idx_l0[idx];
1964 + //v4l2_info(&de->ctx->dev->v4l2_dev,
1965 + // "L0[%d]=dpb[%d]\n", idx, dpb_no);
1966 +
1967 + msg_slice(de,
1968 + dpb_no |
1969 + ((dec->dpb[dpb_no].flags &
1970 + V4L2_HEVC_DPB_ENTRY_LONG_TERM_REFERENCE) ?
1971 + (1 << 4) : 0) |
1972 + (weighted_pred_flag ? (3 << 5) : 0));
1973 + msg_slice(de, dec->dpb[dpb_no].pic_order_cnt_val & 0xffff);
1974 +
1975 + if (weighted_pred_flag) {
1976 + const struct v4l2_hevc_pred_weight_table
1977 + *const w = &sh->pred_weight_table;
1978 + const int luma_weight_denom =
1979 + (1 << w->luma_log2_weight_denom);
1980 + const unsigned int chroma_log2_weight_denom =
1981 + (w->luma_log2_weight_denom +
1982 + w->delta_chroma_log2_weight_denom);
1983 + const int chroma_weight_denom =
1984 + (1 << chroma_log2_weight_denom);
1985 +
1986 + msg_slice(de,
1987 + w->luma_log2_weight_denom |
1988 + (((w->delta_luma_weight_l0[idx] +
1989 + luma_weight_denom) & 0x1ff)
1990 + << 3));
1991 + msg_slice(de, w->luma_offset_l0[idx] & 0xff);
1992 + msg_slice(de,
1993 + chroma_log2_weight_denom |
1994 + (((w->delta_chroma_weight_l0[idx][0] +
1995 + chroma_weight_denom) & 0x1ff)
1996 + << 3));
1997 + msg_slice(de,
1998 + w->chroma_offset_l0[idx][0] & 0xff);
1999 + msg_slice(de,
2000 + chroma_log2_weight_denom |
2001 + (((w->delta_chroma_weight_l0[idx][1] +
2002 + chroma_weight_denom) & 0x1ff)
2003 + << 3));
2004 + msg_slice(de,
2005 + w->chroma_offset_l0[idx][1] & 0xff);
2006 + }
2007 + }
2008 +
2009 + for (idx = 0; idx < s->nb_refs[L1]; ++idx) {
2010 + unsigned int dpb_no = sh->ref_idx_l1[idx];
2011 + //v4l2_info(&de->ctx->dev->v4l2_dev,
2012 + // "L1[%d]=dpb[%d]\n", idx, dpb_no);
2013 + msg_slice(de,
2014 + dpb_no |
2015 + ((dec->dpb[dpb_no].flags &
2016 + V4L2_HEVC_DPB_ENTRY_LONG_TERM_REFERENCE) ?
2017 + (1 << 4) : 0) |
2018 + (weighted_pred_flag ? (3 << 5) : 0));
2019 + msg_slice(de, dec->dpb[dpb_no].pic_order_cnt_val & 0xffff);
2020 + if (weighted_pred_flag) {
2021 + const struct v4l2_hevc_pred_weight_table
2022 + *const w = &sh->pred_weight_table;
2023 + const int luma_weight_denom =
2024 + (1 << w->luma_log2_weight_denom);
2025 + const unsigned int chroma_log2_weight_denom =
2026 + (w->luma_log2_weight_denom +
2027 + w->delta_chroma_log2_weight_denom);
2028 + const int chroma_weight_denom =
2029 + (1 << chroma_log2_weight_denom);
2030 +
2031 + msg_slice(de,
2032 + w->luma_log2_weight_denom |
2033 + (((w->delta_luma_weight_l1[idx] +
2034 + luma_weight_denom) & 0x1ff) << 3));
2035 + msg_slice(de, w->luma_offset_l1[idx] & 0xff);
2036 + msg_slice(de,
2037 + chroma_log2_weight_denom |
2038 + (((w->delta_chroma_weight_l1[idx][0] +
2039 + chroma_weight_denom) & 0x1ff)
2040 + << 3));
2041 + msg_slice(de,
2042 + w->chroma_offset_l1[idx][0] & 0xff);
2043 + msg_slice(de,
2044 + chroma_log2_weight_denom |
2045 + (((w->delta_chroma_weight_l1[idx][1] +
2046 + chroma_weight_denom) & 0x1ff)
2047 + << 3));
2048 + msg_slice(de,
2049 + w->chroma_offset_l1[idx][1] & 0xff);
2050 + }
2051 + }
2052 + } else {
2053 + msg_slice(de, cmd_slice);
2054 + }
2055 +
2056 + msg_slice(de,
2057 + (sh->slice_beta_offset_div2 & 15) |
2058 + ((sh->slice_tc_offset_div2 & 15) << 4) |
2059 + ((sh->flags &
2060 + V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_DEBLOCKING_FILTER_DISABLED) ?
2061 + 1 << 8 : 0) |
2062 + ((sh->flags &
2063 + V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_LOOP_FILTER_ACROSS_SLICES_ENABLED) ?
2064 + 1 << 9 : 0) |
2065 + ((s->pps.flags &
2066 + V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED) ?
2067 + 1 << 10 : 0));
2068 +
2069 + msg_slice(de, ((sh->slice_cr_qp_offset & 31) << 5) +
2070 + (sh->slice_cb_qp_offset & 31)); // CMD_QPOFF
2071 +}
2072 +
2073 +#define CMDS_WRITE_SLICE 1
2074 +static void write_slice(struct rpivid_dec_env *const de,
2075 + const struct rpivid_dec_state *const s,
2076 + const u32 slice_const,
2077 + const unsigned int ctb_col,
2078 + const unsigned int ctb_row)
2079 +{
2080 + const unsigned int cs = (1 << s->log2_ctb_size);
2081 + const unsigned int w_last = s->sps.pic_width_in_luma_samples & (cs - 1);
2082 + const unsigned int h_last = s->sps.pic_height_in_luma_samples & (cs - 1);
2083 +
2084 + p1_apb_write(de, RPI_SLICE,
2085 + slice_const |
2086 + ((ctb_col + 1 < s->ctb_width || !w_last ?
2087 + cs : w_last) << 17) |
2088 + ((ctb_row + 1 < s->ctb_height || !h_last ?
2089 + cs : h_last) << 24));
2090 +}
2091 +
2092 +#define PAUSE_MODE_WPP 1
2093 +#define PAUSE_MODE_TILE 0xffff
2094 +
2095 +/*
2096 + * N.B. This can be called to fill in data from the previous slice so must not
2097 + * use any state data that may change from slice to slice (e.g. qp)
2098 + */
2099 +#define CMDS_NEW_ENTRY_POINT (6 + CMDS_WRITE_SLICE)
2100 +static void new_entry_point(struct rpivid_dec_env *const de,
2101 + const struct rpivid_dec_state *const s,
2102 + const bool do_bte,
2103 + const bool reset_qp_y,
2104 + const u32 pause_mode,
2105 + const unsigned int tile_x,
2106 + const unsigned int tile_y,
2107 + const unsigned int ctb_col,
2108 + const unsigned int ctb_row,
2109 + const unsigned int slice_qp,
2110 + const u32 slice_const)
2111 +{
2112 + const unsigned int endx = s->col_bd[tile_x + 1] - 1;
2113 + const unsigned int endy = (pause_mode == PAUSE_MODE_WPP) ?
2114 + ctb_row : s->row_bd[tile_y + 1] - 1;
2115 +
2116 + p1_apb_write(de, RPI_TILESTART,
2117 + s->col_bd[tile_x] | (s->row_bd[tile_y] << 16));
2118 + p1_apb_write(de, RPI_TILEEND, endx | (endy << 16));
2119 +
2120 + if (do_bte)
2121 + p1_apb_write(de, RPI_BEGINTILEEND, endx | (endy << 16));
2122 +
2123 + write_slice(de, s, slice_const, endx, endy);
2124 +
2125 + if (reset_qp_y) {
2126 + unsigned int sps_qp_bd_offset =
2127 + 6 * s->sps.bit_depth_luma_minus8;
2128 +
2129 + p1_apb_write(de, RPI_QP, sps_qp_bd_offset + slice_qp);
2130 + }
2131 +
2132 + p1_apb_write(de, RPI_MODE,
2133 + pause_mode |
2134 + ((endx == s->ctb_width - 1) << 17) |
2135 + ((endy == s->ctb_height - 1) << 18));
2136 +
2137 + p1_apb_write(de, RPI_CONTROL, (ctb_col << 0) | (ctb_row << 16));
2138 +
2139 + de->entry_tile_x = tile_x;
2140 + de->entry_tile_y = tile_y;
2141 + de->entry_ctb_x = ctb_col;
2142 + de->entry_ctb_y = ctb_row;
2143 + de->entry_qp = slice_qp;
2144 + de->entry_slice = slice_const;
2145 +}
2146 +
2147 +//////////////////////////////////////////////////////////////////////////////
2148 +// Wavefront mode
2149 +
2150 +#define CMDS_WPP_PAUSE 4
2151 +static void wpp_pause(struct rpivid_dec_env *const de, int ctb_row)
2152 +{
2153 + p1_apb_write(de, RPI_STATUS, (ctb_row << 18) | 0x25);
2154 + p1_apb_write(de, RPI_TRANSFER, PROB_BACKUP);
2155 + p1_apb_write(de, RPI_MODE,
2156 + ctb_row == de->pic_height_in_ctbs_y - 1 ?
2157 + 0x70000 : 0x30000);
2158 + p1_apb_write(de, RPI_CONTROL, (ctb_row << 16) + 2);
2159 +}
2160 +
2161 +#define CMDS_WPP_ENTRY_FILL_1 (CMDS_WPP_PAUSE + 2 + CMDS_NEW_ENTRY_POINT)
2162 +static int wpp_entry_fill(struct rpivid_dec_env *const de,
2163 + const struct rpivid_dec_state *const s,
2164 + const unsigned int last_y)
2165 +{
2166 + int rv;
2167 + const unsigned int last_x = s->ctb_width - 1;
2168 +
2169 + rv = cmds_check_space(de, CMDS_WPP_ENTRY_FILL_1 *
2170 + (last_y - de->entry_ctb_y));
2171 + if (rv)
2172 + return rv;
2173 +
2174 + while (de->entry_ctb_y < last_y) {
2175 + /* wpp_entry_x/y set by wpp_entry_point */
2176 + if (s->ctb_width > 2)
2177 + wpp_pause(de, de->entry_ctb_y);
2178 + p1_apb_write(de, RPI_STATUS,
2179 + (de->entry_ctb_y << 18) | (last_x << 5) | 2);
2180 +
2181 + /* if width == 1 then the saved state is the init one */
2182 + if (s->ctb_width == 2)
2183 + p1_apb_write(de, RPI_TRANSFER, PROB_BACKUP);
2184 + else
2185 + p1_apb_write(de, RPI_TRANSFER, PROB_RELOAD);
2186 +
2187 + new_entry_point(de, s, false, true, PAUSE_MODE_WPP,
2188 + 0, 0, 0, de->entry_ctb_y + 1,
2189 + de->entry_qp, de->entry_slice);
2190 + }
2191 + return 0;
2192 +}
2193 +
2194 +static int wpp_end_previous_slice(struct rpivid_dec_env *const de,
2195 + const struct rpivid_dec_state *const s)
2196 +{
2197 + int rv;
2198 +
2199 + rv = wpp_entry_fill(de, s, s->prev_ctb_y);
2200 + if (rv)
2201 + return rv;
2202 +
2203 + rv = cmds_check_space(de, CMDS_WPP_PAUSE + 2);
2204 + if (rv)
2205 + return rv;
2206 +
2207 + if (de->entry_ctb_x < 2 &&
2208 + (de->entry_ctb_y < s->start_ctb_y || s->start_ctb_x > 2) &&
2209 + s->ctb_width > 2)
2210 + wpp_pause(de, s->prev_ctb_y);
2211 + p1_apb_write(de, RPI_STATUS,
2212 + 1 | (s->prev_ctb_x << 5) | (s->prev_ctb_y << 18));
2213 + if (s->start_ctb_x == 2 ||
2214 + (s->ctb_width == 2 && de->entry_ctb_y < s->start_ctb_y))
2215 + p1_apb_write(de, RPI_TRANSFER, PROB_BACKUP);
2216 + return 0;
2217 +}
2218 +
2219 +/* Only main profile supported so WPP => !Tiles which makes some of the
2220 + * next chunk code simpler
2221 + */
2222 +static int wpp_decode_slice(struct rpivid_dec_env *const de,
2223 + const struct rpivid_dec_state *const s,
2224 + bool last_slice)
2225 +{
2226 + bool reset_qp_y = true;
2227 + const bool indep = !s->dependent_slice_segment_flag;
2228 + int rv;
2229 +
2230 + if (s->start_ts) {
2231 + rv = wpp_end_previous_slice(de, s);
2232 + if (rv)
2233 + return rv;
2234 + }
2235 + pre_slice_decode(de, s);
2236 +
2237 + rv = cmds_check_space(de,
2238 + CMDS_WRITE_BITSTREAM +
2239 + CMDS_WRITE_PROB +
2240 + CMDS_PROGRAM_SLICECMDS +
2241 + CMDS_NEW_SLICE_SEGMENT +
2242 + CMDS_NEW_ENTRY_POINT);
2243 + if (rv)
2244 + return rv;
2245 +
2246 + rv = write_bitstream(de, s);
2247 + if (rv)
2248 + return rv;
2249 +
2250 + if (!s->start_ts || indep || s->ctb_width == 1)
2251 + write_prob(de, s);
2252 + else if (!s->start_ctb_x)
2253 + p1_apb_write(de, RPI_TRANSFER, PROB_RELOAD);
2254 + else
2255 + reset_qp_y = false;
2256 +
2257 + program_slicecmds(de, s->slice_idx);
2258 + new_slice_segment(de, s);
2259 + new_entry_point(de, s, indep, reset_qp_y, PAUSE_MODE_WPP,
2260 + 0, 0, s->start_ctb_x, s->start_ctb_y,
2261 + s->slice_qp, slice_reg_const(s));
2262 +
2263 + if (last_slice) {
2264 + rv = wpp_entry_fill(de, s, s->ctb_height - 1);
2265 + if (rv)
2266 + return rv;
2267 +
2268 + rv = cmds_check_space(de, CMDS_WPP_PAUSE + 1);
2269 + if (rv)
2270 + return rv;
2271 +
2272 + if (de->entry_ctb_x < 2 && s->ctb_width > 2)
2273 + wpp_pause(de, s->ctb_height - 1);
2274 +
2275 + p1_apb_write(de, RPI_STATUS,
2276 + 1 | ((s->ctb_width - 1) << 5) |
2277 + ((s->ctb_height - 1) << 18));
2278 + }
2279 + return 0;
2280 +}
2281 +
2282 +//////////////////////////////////////////////////////////////////////////////
2283 +// Tiles mode
2284 +
2285 +// Guarantees 1 cmd entry free on exit
2286 +static int tile_entry_fill(struct rpivid_dec_env *const de,
2287 + const struct rpivid_dec_state *const s,
2288 + const unsigned int last_tile_x,
2289 + const unsigned int last_tile_y)
2290 +{
2291 + while (de->entry_tile_y < last_tile_y ||
2292 + (de->entry_tile_y == last_tile_y &&
2293 + de->entry_tile_x < last_tile_x)) {
2294 + int rv;
2295 + unsigned int t_x = de->entry_tile_x;
2296 + unsigned int t_y = de->entry_tile_y;
2297 + const unsigned int last_x = s->col_bd[t_x + 1] - 1;
2298 + const unsigned int last_y = s->row_bd[t_y + 1] - 1;
2299 +
2300 + // One more than needed here
2301 + rv = cmds_check_space(de, CMDS_NEW_ENTRY_POINT + 3);
2302 + if (rv)
2303 + return rv;
2304 +
2305 + p1_apb_write(de, RPI_STATUS,
2306 + 2 | (last_x << 5) | (last_y << 18));
2307 + p1_apb_write(de, RPI_TRANSFER, PROB_RELOAD);
2308 +
2309 + // Inc tile
2310 + if (++t_x >= s->tile_width) {
2311 + t_x = 0;
2312 + ++t_y;
2313 + }
2314 +
2315 + new_entry_point(de, s, false, true, PAUSE_MODE_TILE,
2316 + t_x, t_y, s->col_bd[t_x], s->row_bd[t_y],
2317 + de->entry_qp, de->entry_slice);
2318 + }
2319 + return 0;
2320 +}
2321 +
2322 +/*
2323 + * Write STATUS register with expected end CTU address of previous slice
2324 + */
2325 +static int end_previous_slice(struct rpivid_dec_env *const de,
2326 + const struct rpivid_dec_state *const s)
2327 +{
2328 + int rv;
2329 +
2330 + rv = tile_entry_fill(de, s,
2331 + ctb_to_tile_x(s, s->prev_ctb_x),
2332 + ctb_to_tile_y(s, s->prev_ctb_y));
2333 + if (rv)
2334 + return rv;
2335 +
2336 + p1_apb_write(de, RPI_STATUS,
2337 + 1 | (s->prev_ctb_x << 5) | (s->prev_ctb_y << 18));
2338 + return 0;
2339 +}
2340 +
2341 +static int decode_slice(struct rpivid_dec_env *const de,
2342 + const struct rpivid_dec_state *const s,
2343 + bool last_slice)
2344 +{
2345 + bool reset_qp_y;
2346 + unsigned int tile_x = ctb_to_tile_x(s, s->start_ctb_x);
2347 + unsigned int tile_y = ctb_to_tile_y(s, s->start_ctb_y);
2348 + int rv;
2349 +
2350 + if (s->start_ts) {
2351 + rv = end_previous_slice(de, s);
2352 + if (rv)
2353 + return rv;
2354 + }
2355 +
2356 + rv = cmds_check_space(de,
2357 + CMDS_WRITE_BITSTREAM +
2358 + CMDS_WRITE_PROB +
2359 + CMDS_PROGRAM_SLICECMDS +
2360 + CMDS_NEW_SLICE_SEGMENT +
2361 + CMDS_NEW_ENTRY_POINT);
2362 + if (rv)
2363 + return rv;
2364 +
2365 + pre_slice_decode(de, s);
2366 + rv = write_bitstream(de, s);
2367 + if (rv)
2368 + return rv;
2369 +
2370 + reset_qp_y = !s->start_ts ||
2371 + !s->dependent_slice_segment_flag ||
2372 + tile_x != ctb_to_tile_x(s, s->prev_ctb_x) ||
2373 + tile_y != ctb_to_tile_y(s, s->prev_ctb_y);
2374 + if (reset_qp_y)
2375 + write_prob(de, s);
2376 +
2377 + program_slicecmds(de, s->slice_idx);
2378 + new_slice_segment(de, s);
2379 + new_entry_point(de, s, !s->dependent_slice_segment_flag, reset_qp_y,
2380 + PAUSE_MODE_TILE,
2381 + tile_x, tile_y, s->start_ctb_x, s->start_ctb_y,
2382 + s->slice_qp, slice_reg_const(s));
2383 +
2384 + /*
2385 + * If this is the last slice then fill in the other tile entries
2386 + * now, otherwise this will be done at the start of the next slice
2387 + * when it will be known where this slice finishes
2388 + */
2389 + if (last_slice) {
2390 + rv = tile_entry_fill(de, s,
2391 + s->tile_width - 1,
2392 + s->tile_height - 1);
2393 + if (rv)
2394 + return rv;
2395 + p1_apb_write(de, RPI_STATUS,
2396 + 1 | ((s->ctb_width - 1) << 5) |
2397 + ((s->ctb_height - 1) << 18));
2398 + }
2399 + return 0;
2400 +}
2401 +
2402 +//////////////////////////////////////////////////////////////////////////////
2403 +// Scaling factors
2404 +
2405 +static void expand_scaling_list(const unsigned int size_id,
2406 + u8 *const dst0,
2407 + const u8 *const src0, uint8_t dc)
2408 +{
2409 + u8 *d;
2410 + unsigned int x, y;
2411 +
2412 + switch (size_id) {
2413 + case 0:
2414 + memcpy(dst0, src0, 16);
2415 + break;
2416 + case 1:
2417 + memcpy(dst0, src0, 64);
2418 + break;
2419 + case 2:
2420 + d = dst0;
2421 +
2422 + for (y = 0; y != 16; y++) {
2423 + const u8 *s = src0 + (y >> 1) * 8;
2424 +
2425 + for (x = 0; x != 8; ++x) {
2426 + *d++ = *s;
2427 + *d++ = *s++;
2428 + }
2429 + }
2430 + dst0[0] = dc;
2431 + break;
2432 + default:
2433 + d = dst0;
2434 +
2435 + for (y = 0; y != 32; y++) {
2436 + const u8 *s = src0 + (y >> 2) * 8;
2437 +
2438 + for (x = 0; x != 8; ++x) {
2439 + *d++ = *s;
2440 + *d++ = *s;
2441 + *d++ = *s;
2442 + *d++ = *s++;
2443 + }
2444 + }
2445 + dst0[0] = dc;
2446 + break;
2447 + }
2448 +}
2449 +
2450 +static void populate_scaling_factors(const struct rpivid_run *const run,
2451 + struct rpivid_dec_env *const de,
2452 + const struct rpivid_dec_state *const s)
2453 +{
2454 + const struct v4l2_ctrl_hevc_scaling_matrix *const sl =
2455 + run->h265.scaling_matrix;
2456 + // Array of constants for scaling factors
2457 + static const u32 scaling_factor_offsets[4][6] = {
2458 + // MID0 MID1 MID2 MID3 MID4 MID5
2459 + // SID0 (4x4)
2460 + { 0x0000, 0x0010, 0x0020, 0x0030, 0x0040, 0x0050 },
2461 + // SID1 (8x8)
2462 + { 0x0060, 0x00A0, 0x00E0, 0x0120, 0x0160, 0x01A0 },
2463 + // SID2 (16x16)
2464 + { 0x01E0, 0x02E0, 0x03E0, 0x04E0, 0x05E0, 0x06E0 },
2465 + // SID3 (32x32)
2466 + { 0x07E0, 0x0BE0, 0x0000, 0x0000, 0x0000, 0x0000 }
2467 + };
2468 +
2469 + unsigned int mid;
2470 +
2471 + for (mid = 0; mid < 6; mid++)
2472 + expand_scaling_list(0, de->scaling_factors +
2473 + scaling_factor_offsets[0][mid],
2474 + sl->scaling_list_4x4[mid], 0);
2475 + for (mid = 0; mid < 6; mid++)
2476 + expand_scaling_list(1, de->scaling_factors +
2477 + scaling_factor_offsets[1][mid],
2478 + sl->scaling_list_8x8[mid], 0);
2479 + for (mid = 0; mid < 6; mid++)
2480 + expand_scaling_list(2, de->scaling_factors +
2481 + scaling_factor_offsets[2][mid],
2482 + sl->scaling_list_16x16[mid],
2483 + sl->scaling_list_dc_coef_16x16[mid]);
2484 + for (mid = 0; mid < 2; mid++)
2485 + expand_scaling_list(3, de->scaling_factors +
2486 + scaling_factor_offsets[3][mid],
2487 + sl->scaling_list_32x32[mid],
2488 + sl->scaling_list_dc_coef_32x32[mid]);
2489 +}
2490 +
2491 +static void free_ps_info(struct rpivid_dec_state *const s)
2492 +{
2493 + kfree(s->ctb_addr_rs_to_ts);
2494 + s->ctb_addr_rs_to_ts = NULL;
2495 + kfree(s->ctb_addr_ts_to_rs);
2496 + s->ctb_addr_ts_to_rs = NULL;
2497 +
2498 + kfree(s->col_bd);
2499 + s->col_bd = NULL;
2500 + kfree(s->row_bd);
2501 + s->row_bd = NULL;
2502 +}
2503 +
2504 +static unsigned int tile_width(const struct rpivid_dec_state *const s,
2505 + const unsigned int t_x)
2506 +{
2507 + return s->col_bd[t_x + 1] - s->col_bd[t_x];
2508 +}
2509 +
2510 +static unsigned int tile_height(const struct rpivid_dec_state *const s,
2511 + const unsigned int t_y)
2512 +{
2513 + return s->row_bd[t_y + 1] - s->row_bd[t_y];
2514 +}
2515 +
2516 +static void fill_rs_to_ts(struct rpivid_dec_state *const s)
2517 +{
2518 + unsigned int ts = 0;
2519 + unsigned int t_y;
2520 + unsigned int tr_rs = 0;
2521 +
2522 + for (t_y = 0; t_y != s->tile_height; ++t_y) {
2523 + const unsigned int t_h = tile_height(s, t_y);
2524 + unsigned int t_x;
2525 + unsigned int tc_rs = tr_rs;
2526 +
2527 + for (t_x = 0; t_x != s->tile_width; ++t_x) {
2528 + const unsigned int t_w = tile_width(s, t_x);
2529 + unsigned int y;
2530 + unsigned int rs = tc_rs;
2531 +
2532 + for (y = 0; y != t_h; ++y) {
2533 + unsigned int x;
2534 +
2535 + for (x = 0; x != t_w; ++x) {
2536 + s->ctb_addr_rs_to_ts[rs + x] = ts;
2537 + s->ctb_addr_ts_to_rs[ts] = rs + x;
2538 + ++ts;
2539 + }
2540 + rs += s->ctb_width;
2541 + }
2542 + tc_rs += t_w;
2543 + }
2544 + tr_rs += t_h * s->ctb_width;
2545 + }
2546 +}
2547 +
2548 +static int updated_ps(struct rpivid_dec_state *const s)
2549 +{
2550 + unsigned int i;
2551 +
2552 + free_ps_info(s);
2553 +
2554 + // Inferred parameters
2555 + s->log2_ctb_size = s->sps.log2_min_luma_coding_block_size_minus3 + 3 +
2556 + s->sps.log2_diff_max_min_luma_coding_block_size;
2557 +
2558 + s->ctb_width = (s->sps.pic_width_in_luma_samples +
2559 + (1 << s->log2_ctb_size) - 1) >>
2560 + s->log2_ctb_size;
2561 + s->ctb_height = (s->sps.pic_height_in_luma_samples +
2562 + (1 << s->log2_ctb_size) - 1) >>
2563 + s->log2_ctb_size;
2564 + s->ctb_size = s->ctb_width * s->ctb_height;
2565 +
2566 + // Inferred parameters
2567 +
2568 + s->ctb_addr_rs_to_ts = kmalloc_array(s->ctb_size,
2569 + sizeof(*s->ctb_addr_rs_to_ts),
2570 + GFP_KERNEL);
2571 + if (!s->ctb_addr_rs_to_ts)
2572 + goto fail;
2573 + s->ctb_addr_ts_to_rs = kmalloc_array(s->ctb_size,
2574 + sizeof(*s->ctb_addr_ts_to_rs),
2575 + GFP_KERNEL);
2576 + if (!s->ctb_addr_ts_to_rs)
2577 + goto fail;
2578 +
2579 + if (!(s->pps.flags & V4L2_HEVC_PPS_FLAG_TILES_ENABLED)) {
2580 + s->tile_width = 1;
2581 + s->tile_height = 1;
2582 + } else {
2583 + s->tile_width = s->pps.num_tile_columns_minus1 + 1;
2584 + s->tile_height = s->pps.num_tile_rows_minus1 + 1;
2585 + }
2586 +
2587 + s->col_bd = kmalloc((s->tile_width + 1) * sizeof(*s->col_bd),
2588 + GFP_KERNEL);
2589 + if (!s->col_bd)
2590 + goto fail;
2591 + s->row_bd = kmalloc((s->tile_height + 1) * sizeof(*s->row_bd),
2592 + GFP_KERNEL);
2593 + if (!s->row_bd)
2594 + goto fail;
2595 +
2596 + s->col_bd[0] = 0;
2597 + for (i = 1; i < s->tile_width; i++)
2598 + s->col_bd[i] = s->col_bd[i - 1] +
2599 + s->pps.column_width_minus1[i - 1] + 1;
2600 + s->col_bd[s->tile_width] = s->ctb_width;
2601 +
2602 + s->row_bd[0] = 0;
2603 + for (i = 1; i < s->tile_height; i++)
2604 + s->row_bd[i] = s->row_bd[i - 1] +
2605 + s->pps.row_height_minus1[i - 1] + 1;
2606 + s->row_bd[s->tile_height] = s->ctb_height;
2607 +
2608 + fill_rs_to_ts(s);
2609 + return 0;
2610 +
2611 +fail:
2612 + free_ps_info(s);
2613 + /* Set invalid to force reload */
2614 + s->sps.pic_width_in_luma_samples = 0;
2615 + return -ENOMEM;
2616 +}
2617 +
2618 +static int write_cmd_buffer(struct rpivid_dev *const dev,
2619 + struct rpivid_dec_env *const de,
2620 + const struct rpivid_dec_state *const s)
2621 +{
2622 + const size_t cmd_size = ALIGN(de->cmd_len * sizeof(de->cmd_fifo[0]),
2623 + dev->cache_align);
2624 +
2625 + de->cmd_addr = dma_map_single(dev->dev, de->cmd_fifo,
2626 + cmd_size, DMA_TO_DEVICE);
2627 + if (dma_mapping_error(dev->dev, de->cmd_addr)) {
2628 + v4l2_err(&dev->v4l2_dev,
2629 + "Map cmd buffer (%zu): FAILED\n", cmd_size);
2630 + return -ENOMEM;
2631 + }
2632 + de->cmd_size = cmd_size;
2633 + return 0;
2634 +}
2635 +
2636 +static void setup_colmv(struct rpivid_ctx *const ctx, struct rpivid_run *run,
2637 + struct rpivid_dec_state *const s)
2638 +{
2639 + ctx->colmv_stride = ALIGN(s->sps.pic_width_in_luma_samples, 64);
2640 + ctx->colmv_picsize = ctx->colmv_stride *
2641 + (ALIGN(s->sps.pic_height_in_luma_samples, 64) >> 4);
2642 +}
2643 +
2644 +// Can be called from irq context
2645 +static struct rpivid_dec_env *dec_env_new(struct rpivid_ctx *const ctx)
2646 +{
2647 + struct rpivid_dec_env *de;
2648 + unsigned long lock_flags;
2649 +
2650 + spin_lock_irqsave(&ctx->dec_lock, lock_flags);
2651 +
2652 + de = ctx->dec_free;
2653 + if (de) {
2654 + ctx->dec_free = de->next;
2655 + de->next = NULL;
2656 + de->state = RPIVID_DECODE_SLICE_START;
2657 + }
2658 +
2659 + spin_unlock_irqrestore(&ctx->dec_lock, lock_flags);
2660 + return de;
2661 +}
2662 +
2663 +// Can be called from irq context
2664 +static void dec_env_delete(struct rpivid_dec_env *const de)
2665 +{
2666 + struct rpivid_ctx * const ctx = de->ctx;
2667 + unsigned long lock_flags;
2668 +
2669 + if (de->cmd_size) {
2670 + dma_unmap_single(ctx->dev->dev, de->cmd_addr, de->cmd_size,
2671 + DMA_TO_DEVICE);
2672 + de->cmd_size = 0;
2673 + }
2674 +
2675 + aux_q_release(ctx, &de->frame_aux);
2676 + aux_q_release(ctx, &de->col_aux);
2677 +
2678 + spin_lock_irqsave(&ctx->dec_lock, lock_flags);
2679 +
2680 + de->state = RPIVID_DECODE_END;
2681 + de->next = ctx->dec_free;
2682 + ctx->dec_free = de;
2683 +
2684 + spin_unlock_irqrestore(&ctx->dec_lock, lock_flags);
2685 +}
2686 +
2687 +static void dec_env_uninit(struct rpivid_ctx *const ctx)
2688 +{
2689 + unsigned int i;
2690 +
2691 + if (ctx->dec_pool) {
2692 + for (i = 0; i != RPIVID_DEC_ENV_COUNT; ++i) {
2693 + struct rpivid_dec_env *const de = ctx->dec_pool + i;
2694 +
2695 + kfree(de->cmd_fifo);
2696 + }
2697 +
2698 + kfree(ctx->dec_pool);
2699 + }
2700 +
2701 + ctx->dec_pool = NULL;
2702 + ctx->dec_free = NULL;
2703 +}
2704 +
2705 +static int dec_env_init(struct rpivid_ctx *const ctx)
2706 +{
2707 + unsigned int i;
2708 +
2709 + ctx->dec_pool = kzalloc(sizeof(*ctx->dec_pool) * RPIVID_DEC_ENV_COUNT,
2710 + GFP_KERNEL);
2711 + if (!ctx->dec_pool)
2712 + return -1;
2713 +
2714 + spin_lock_init(&ctx->dec_lock);
2715 +
2716 + // Build free chain
2717 + ctx->dec_free = ctx->dec_pool;
2718 + for (i = 0; i != RPIVID_DEC_ENV_COUNT - 1; ++i)
2719 + ctx->dec_pool[i].next = ctx->dec_pool + i + 1;
2720 +
2721 + // Fill in other bits
2722 + for (i = 0; i != RPIVID_DEC_ENV_COUNT; ++i) {
2723 + struct rpivid_dec_env *const de = ctx->dec_pool + i;
2724 +
2725 + de->ctx = ctx;
2726 + de->decode_order = i;
2727 +// de->cmd_max = 1024;
2728 + de->cmd_max = 8096;
2729 + de->cmd_fifo = kmalloc_array(de->cmd_max,
2730 + sizeof(struct rpi_cmd),
2731 + GFP_KERNEL);
2732 + if (!de->cmd_fifo)
2733 + goto fail;
2734 + }
2735 +
2736 + return 0;
2737 +
2738 +fail:
2739 + dec_env_uninit(ctx);
2740 + return -1;
2741 +}
2742 +
2743 +// Assume that we get exactly the same DPB for every slice
2744 +// it makes no real sense otherwise
2745 +#if V4L2_HEVC_DPB_ENTRIES_NUM_MAX > 16
2746 +#error HEVC_DPB_ENTRIES > h/w slots
2747 +#endif
2748 +
2749 +static u32 mk_config2(const struct rpivid_dec_state *const s)
2750 +{
2751 + const struct v4l2_ctrl_hevc_sps *const sps = &s->sps;
2752 + const struct v4l2_ctrl_hevc_pps *const pps = &s->pps;
2753 + u32 c;
2754 + // BitDepthY
2755 + c = (sps->bit_depth_luma_minus8 + 8) << 0;
2756 + // BitDepthC
2757 + c |= (sps->bit_depth_chroma_minus8 + 8) << 4;
2758 + // BitDepthY
2759 + if (sps->bit_depth_luma_minus8)
2760 + c |= BIT(8);
2761 + // BitDepthC
2762 + if (sps->bit_depth_chroma_minus8)
2763 + c |= BIT(9);
2764 + c |= s->log2_ctb_size << 10;
2765 + if (pps->flags & V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED)
2766 + c |= BIT(13);
2767 + if (sps->flags & V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED)
2768 + c |= BIT(14);
2769 + if (s->mk_aux)
2770 + c |= BIT(15); /* Write motion vectors to external memory */
2771 + c |= (pps->log2_parallel_merge_level_minus2 + 2) << 16;
2772 + if (s->slice_temporal_mvp)
2773 + c |= BIT(19);
2774 + if (sps->flags & V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED)
2775 + c |= BIT(20);
2776 + c |= (pps->pps_cb_qp_offset & 31) << 21;
2777 + c |= (pps->pps_cr_qp_offset & 31) << 26;
2778 + return c;
2779 +}
2780 +
2781 +static inline bool is_ref_unit_type(const unsigned int nal_unit_type)
2782 +{
2783 + /* From Table 7-1
2784 + * True for 1, 3, 5, 7, 9, 11, 13, 15
2785 + */
2786 + return (nal_unit_type & ~0xe) != 0;
2787 +}
2788 +
2789 +static void rpivid_h265_setup(struct rpivid_ctx *ctx, struct rpivid_run *run)
2790 +{
2791 + struct rpivid_dev *const dev = ctx->dev;
2792 + const struct v4l2_ctrl_hevc_decode_params *const dec =
2793 + run->h265.dec;
2794 + /* sh0 used where slice header contents should be constant over all
2795 + * slices, or first slice of frame
2796 + */
2797 + const struct v4l2_ctrl_hevc_slice_params *const sh0 =
2798 + run->h265.slice_params;
2799 + struct rpivid_q_aux *dpb_q_aux[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
2800 + struct rpivid_dec_state *const s = ctx->state;
2801 + struct vb2_queue *vq;
2802 + struct rpivid_dec_env *de = ctx->dec0;
2803 + unsigned int prev_rs;
2804 + unsigned int i;
2805 + int rv;
2806 + bool slice_temporal_mvp;
2807 + bool frame_end;
2808 +
2809 + xtrace_in(dev, de);
2810 + s->sh = NULL; // Avoid use until in the slice loop
2811 +
2812 + frame_end =
2813 + ((run->src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF) == 0);
2814 +
2815 + slice_temporal_mvp = (sh0->flags &
2816 + V4L2_HEVC_SLICE_PARAMS_FLAG_SLICE_TEMPORAL_MVP_ENABLED);
2817 +
2818 + if (de && de->state != RPIVID_DECODE_END) {
2819 + switch (de->state) {
2820 + case RPIVID_DECODE_SLICE_CONTINUE:
2821 + // Expected state
2822 + break;
2823 + default:
2824 + v4l2_err(&dev->v4l2_dev, "%s: Unexpected state: %d\n",
2825 + __func__, de->state);
2826 + fallthrough;
2827 + case RPIVID_DECODE_ERROR_CONTINUE:
2828 + // Uncleared error - fail now
2829 + goto fail;
2830 + }
2831 +
2832 + if (s->slice_temporal_mvp != slice_temporal_mvp) {
2833 + v4l2_warn(&dev->v4l2_dev,
2834 + "Slice Temporal MVP non-constant\n");
2835 + goto fail;
2836 + }
2837 + } else {
2838 + /* Frame start */
2839 + unsigned int ctb_size_y;
2840 + bool sps_changed = false;
2841 +
2842 + if (memcmp(&s->sps, run->h265.sps, sizeof(s->sps)) != 0) {
2843 + /* SPS changed */
2844 + v4l2_info(&dev->v4l2_dev, "SPS changed\n");
2845 + memcpy(&s->sps, run->h265.sps, sizeof(s->sps));
2846 + sps_changed = true;
2847 + }
2848 + if (sps_changed ||
2849 + memcmp(&s->pps, run->h265.pps, sizeof(s->pps)) != 0) {
2850 + /* SPS changed */
2851 + v4l2_info(&dev->v4l2_dev, "PPS changed\n");
2852 + memcpy(&s->pps, run->h265.pps, sizeof(s->pps));
2853 +
2854 + /* Recalc stuff as required */
2855 + rv = updated_ps(s);
2856 + if (rv)
2857 + goto fail;
2858 + }
2859 +
2860 + de = dec_env_new(ctx);
2861 + if (!de) {
2862 + v4l2_err(&dev->v4l2_dev,
2863 + "Failed to find free decode env\n");
2864 + goto fail;
2865 + }
2866 + ctx->dec0 = de;
2867 +
2868 + ctb_size_y =
2869 + 1U << (s->sps.log2_min_luma_coding_block_size_minus3 +
2870 + 3 +
2871 + s->sps.log2_diff_max_min_luma_coding_block_size);
2872 +
2873 + de->pic_width_in_ctbs_y =
2874 + (s->sps.pic_width_in_luma_samples + ctb_size_y - 1) /
2875 + ctb_size_y; // 7-15
2876 + de->pic_height_in_ctbs_y =
2877 + (s->sps.pic_height_in_luma_samples + ctb_size_y - 1) /
2878 + ctb_size_y; // 7-17
2879 + de->cmd_len = 0;
2880 + de->dpbno_col = ~0U;
2881 +
2882 + de->bit_copy_gptr = ctx->bitbufs + ctx->p1idx;
2883 + de->bit_copy_len = 0;
2884 +
2885 + de->frame_c_offset = ctx->dst_fmt.height * 128;
2886 + de->frame_stride = ctx->dst_fmt.plane_fmt[0].bytesperline * 128;
2887 + de->frame_addr =
2888 + vb2_dma_contig_plane_dma_addr(&run->dst->vb2_buf, 0);
2889 + de->frame_aux = NULL;
2890 +
2891 + if (s->sps.bit_depth_luma_minus8 !=
2892 + s->sps.bit_depth_chroma_minus8) {
2893 + v4l2_warn(&dev->v4l2_dev,
2894 + "Chroma depth (%d) != Luma depth (%d)\n",
2895 + s->sps.bit_depth_chroma_minus8 + 8,
2896 + s->sps.bit_depth_luma_minus8 + 8);
2897 + goto fail;
2898 + }
2899 + if (s->sps.bit_depth_luma_minus8 == 0) {
2900 + if (ctx->dst_fmt.pixelformat !=
2901 + V4L2_PIX_FMT_NV12_COL128) {
2902 + v4l2_err(&dev->v4l2_dev,
2903 + "Pixel format %#x != NV12_COL128 for 8-bit output",
2904 + ctx->dst_fmt.pixelformat);
2905 + goto fail;
2906 + }
2907 + } else if (s->sps.bit_depth_luma_minus8 == 2) {
2908 + if (ctx->dst_fmt.pixelformat !=
2909 + V4L2_PIX_FMT_NV12_10_COL128) {
2910 + v4l2_err(&dev->v4l2_dev,
2911 + "Pixel format %#x != NV12_10_COL128 for 10-bit output",
2912 + ctx->dst_fmt.pixelformat);
2913 + goto fail;
2914 + }
2915 + } else {
2916 + v4l2_warn(&dev->v4l2_dev,
2917 + "Luma depth (%d) unsupported\n",
2918 + s->sps.bit_depth_luma_minus8 + 8);
2919 + goto fail;
2920 + }
2921 + if (run->dst->vb2_buf.num_planes != 1) {
2922 + v4l2_warn(&dev->v4l2_dev, "Capture planes (%d) != 1\n",
2923 + run->dst->vb2_buf.num_planes);
2924 + goto fail;
2925 + }
2926 + if (run->dst->planes[0].length <
2927 + ctx->dst_fmt.plane_fmt[0].sizeimage) {
2928 + v4l2_warn(&dev->v4l2_dev,
2929 + "Capture plane[0] length (%d) < sizeimage (%d)\n",
2930 + run->dst->planes[0].length,
2931 + ctx->dst_fmt.plane_fmt[0].sizeimage);
2932 + goto fail;
2933 + }
2934 +
2935 + // Fill in ref planes with our address s.t. if we mess
2936 + // up refs somehow then we still have a valid address
2937 + // entry
2938 + for (i = 0; i != 16; ++i)
2939 + de->ref_addrs[i] = de->frame_addr;
2940 +
2941 + /*
2942 + * Stash initial temporal_mvp flag
2943 + * This must be the same for all pic slices (7.4.7.1)
2944 + */
2945 + s->slice_temporal_mvp = slice_temporal_mvp;
2946 +
2947 + /*
2948 + * Need Aux ents for all (ref) DPB ents if temporal MV could
2949 + * be enabled for any pic
2950 + */
2951 + s->use_aux = ((s->sps.flags &
2952 + V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED) != 0);
2953 + s->mk_aux = s->use_aux &&
2954 + (s->sps.sps_max_sub_layers_minus1 >= sh0->nuh_temporal_id_plus1 ||
2955 + is_ref_unit_type(sh0->nal_unit_type));
2956 +
2957 + // Phase 2 reg pre-calc
2958 + de->rpi_config2 = mk_config2(s);
2959 + de->rpi_framesize = (s->sps.pic_height_in_luma_samples << 16) |
2960 + s->sps.pic_width_in_luma_samples;
2961 + de->rpi_currpoc = sh0->slice_pic_order_cnt;
2962 +
2963 + if (s->sps.flags &
2964 + V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED) {
2965 + setup_colmv(ctx, run, s);
2966 + }
2967 +
2968 + s->slice_idx = 0;
2969 +
2970 + if (sh0->slice_segment_addr != 0) {
2971 + v4l2_warn(&dev->v4l2_dev,
2972 + "New frame but segment_addr=%d\n",
2973 + sh0->slice_segment_addr);
2974 + goto fail;
2975 + }
2976 +
2977 + /* Allocate a bitbuf if we need one - don't need one if single
2978 + * slice as we can use the src buf directly
2979 + */
2980 + if (!frame_end && !de->bit_copy_gptr->ptr) {
2981 + size_t bits_alloc;
2982 + bits_alloc = rpivid_bit_buf_size(s->sps.pic_width_in_luma_samples,
2983 + s->sps.pic_height_in_luma_samples,
2984 + s->sps.bit_depth_luma_minus8);
2985 +
2986 + if (gptr_alloc(dev, de->bit_copy_gptr,
2987 + bits_alloc,
2988 + DMA_ATTR_FORCE_CONTIGUOUS) != 0) {
2989 + v4l2_err(&dev->v4l2_dev,
2990 + "Unable to alloc buf (%zu) for bit copy\n",
2991 + bits_alloc);
2992 + goto fail;
2993 + }
2994 + v4l2_info(&dev->v4l2_dev,
2995 + "Alloc buf (%zu) for bit copy OK\n",
2996 + bits_alloc);
2997 + }
2998 + }
2999 +
3000 + // Either map src buffer or use directly
3001 + s->src_addr = 0;
3002 + s->src_buf = NULL;
3003 +
3004 + if (frame_end)
3005 + s->src_addr = vb2_dma_contig_plane_dma_addr(&run->src->vb2_buf,
3006 + 0);
3007 + if (!s->src_addr)
3008 + s->src_buf = vb2_plane_vaddr(&run->src->vb2_buf, 0);
3009 + if (!s->src_addr && !s->src_buf) {
3010 + v4l2_err(&dev->v4l2_dev, "Failed to map src buffer\n");
3011 + goto fail;
3012 + }
3013 +
3014 + // Pre calc a few things
3015 + s->dec = dec;
3016 + for (i = 0; i != run->h265.slice_ents; ++i) {
3017 + const struct v4l2_ctrl_hevc_slice_params *const sh = sh0 + i;
3018 + const bool last_slice = frame_end && i + 1 == run->h265.slice_ents;
3019 +
3020 + s->sh = sh;
3021 +
3022 + if (run->src->planes[0].bytesused < (sh->bit_size + 7) / 8) {
3023 + v4l2_warn(&dev->v4l2_dev,
3024 + "Bit size %d > bytesused %d\n",
3025 + sh->bit_size, run->src->planes[0].bytesused);
3026 + goto fail;
3027 + }
3028 + if (sh->data_byte_offset >= sh->bit_size / 8) {
3029 + v4l2_warn(&dev->v4l2_dev,
3030 + "Bit size %u < Byte offset %u * 8\n",
3031 + sh->bit_size, sh->data_byte_offset);
3032 + goto fail;
3033 + }
3034 +
3035 + s->slice_qp = 26 + s->pps.init_qp_minus26 + sh->slice_qp_delta;
3036 + s->max_num_merge_cand = sh->slice_type == HEVC_SLICE_I ?
3037 + 0 :
3038 + (5 - sh->five_minus_max_num_merge_cand);
3039 + s->dependent_slice_segment_flag =
3040 + ((sh->flags &
3041 + V4L2_HEVC_SLICE_PARAMS_FLAG_DEPENDENT_SLICE_SEGMENT) != 0);
3042 +
3043 + s->nb_refs[0] = (sh->slice_type == HEVC_SLICE_I) ?
3044 + 0 :
3045 + sh->num_ref_idx_l0_active_minus1 + 1;
3046 + s->nb_refs[1] = (sh->slice_type != HEVC_SLICE_B) ?
3047 + 0 :
3048 + sh->num_ref_idx_l1_active_minus1 + 1;
3049 +
3050 + if (s->sps.flags & V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED)
3051 + populate_scaling_factors(run, de, s);
3052 +
3053 + /* Calc all the random coord info to avoid repeated conversion in/out */
3054 + s->start_ts = s->ctb_addr_rs_to_ts[sh->slice_segment_addr];
3055 + s->start_ctb_x = sh->slice_segment_addr % de->pic_width_in_ctbs_y;
3056 + s->start_ctb_y = sh->slice_segment_addr / de->pic_width_in_ctbs_y;
3057 + /* Last CTB of previous slice */
3058 + prev_rs = !s->start_ts ? 0 : s->ctb_addr_ts_to_rs[s->start_ts - 1];
3059 + s->prev_ctb_x = prev_rs % de->pic_width_in_ctbs_y;
3060 + s->prev_ctb_y = prev_rs / de->pic_width_in_ctbs_y;
3061 +
3062 + if ((s->pps.flags & V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED))
3063 + rv = wpp_decode_slice(de, s, last_slice);
3064 + else
3065 + rv = decode_slice(de, s, last_slice);
3066 + if (rv)
3067 + goto fail;
3068 +
3069 + ++s->slice_idx;
3070 + }
3071 +
3072 + if (!frame_end) {
3073 + xtrace_ok(dev, de);
3074 + return;
3075 + }
3076 +
3077 + // Frame end
3078 + memset(dpb_q_aux, 0,
3079 + sizeof(*dpb_q_aux) * V4L2_HEVC_DPB_ENTRIES_NUM_MAX);
3080 +
3081 + // Locate ref frames
3082 + // At least in the current implementation this is constant across all
3083 + // slices. If this changes we will need idx mapping code.
3084 + // Uses sh so here rather than trigger
3085 +
3086 + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx,
3087 + V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE);
3088 +
3089 + if (!vq) {
3090 + v4l2_err(&dev->v4l2_dev, "VQ gone!\n");
3091 + goto fail;
3092 + }
3093 +
3094 + // v4l2_info(&dev->v4l2_dev, "rpivid_h265_end of frame\n");
3095 + if (write_cmd_buffer(dev, de, s))
3096 + goto fail;
3097 +
3098 + for (i = 0; i < dec->num_active_dpb_entries; ++i) {
3099 + struct vb2_buffer *buf = vb2_find_buffer(vq, dec->dpb[i].timestamp);
3100 + if (!buf) {
3101 + v4l2_warn(&dev->v4l2_dev,
3102 + "Missing DPB ent %d, timestamp=%lld\n",
3103 + i, (long long)dec->dpb[i].timestamp);
3104 + continue;
3105 + }
3106 +
3107 + if (s->use_aux) {
3108 + int buffer_index = buf->index;
3109 + dpb_q_aux[i] = aux_q_ref_idx(ctx, buffer_index);
3110 + if (!dpb_q_aux[i])
3111 + v4l2_warn(&dev->v4l2_dev,
3112 + "Missing DPB AUX ent %d, timestamp=%lld, index=%d\n",
3113 + i, (long long)dec->dpb[i].timestamp,
3114 + buffer_index);
3115 + }
3116 +
3117 + de->ref_addrs[i] =
3118 + vb2_dma_contig_plane_dma_addr(buf, 0);
3119 + }
3120 +
3121 + // Move DPB from temp
3122 + for (i = 0; i != V4L2_HEVC_DPB_ENTRIES_NUM_MAX; ++i) {
3123 + aux_q_release(ctx, &s->ref_aux[i]);
3124 + s->ref_aux[i] = dpb_q_aux[i];
3125 + }
3126 + // Unref the old frame aux too - it is either in the DPB or not
3127 + // now
3128 + aux_q_release(ctx, &s->frame_aux);
3129 +
3130 + if (s->mk_aux) {
3131 + s->frame_aux = aux_q_new(ctx, run->dst->vb2_buf.index);
3132 +
3133 + if (!s->frame_aux) {
3134 + v4l2_err(&dev->v4l2_dev,
3135 + "Failed to obtain aux storage for frame\n");
3136 + goto fail;
3137 + }
3138 +
3139 + de->frame_aux = aux_q_ref(ctx, s->frame_aux);
3140 + }
3141 +
3142 + if (de->dpbno_col != ~0U) {
3143 + if (de->dpbno_col >= dec->num_active_dpb_entries) {
3144 + v4l2_err(&dev->v4l2_dev,
3145 + "Col ref index %d >= %d\n",
3146 + de->dpbno_col,
3147 + dec->num_active_dpb_entries);
3148 + } else {
3149 + // Standard requires that the col pic is
3150 + // constant for the duration of the pic
3151 + // (text of collocated_ref_idx in H265-2 2018
3152 + // 7.4.7.1)
3153 +
3154 + // Spot the collocated ref in passing
3155 + de->col_aux = aux_q_ref(ctx,
3156 + dpb_q_aux[de->dpbno_col]);
3157 +
3158 + if (!de->col_aux) {
3159 + v4l2_warn(&dev->v4l2_dev,
3160 + "Missing DPB ent for col\n");
3161 + // Probably need to abort if this fails
3162 + // as P2 may explode on bad data
3163 + goto fail;
3164 + }
3165 + }
3166 + }
3167 +
3168 + de->state = RPIVID_DECODE_PHASE1;
3169 + xtrace_ok(dev, de);
3170 + return;
3171 +
3172 +fail:
3173 + if (de)
3174 + // Actual error reporting happens in Trigger
3175 + de->state = frame_end ? RPIVID_DECODE_ERROR_DONE :
3176 + RPIVID_DECODE_ERROR_CONTINUE;
3177 + xtrace_fail(dev, de);
3178 +}
3179 +
3180 +//////////////////////////////////////////////////////////////////////////////
3181 +// Handle PU and COEFF stream overflow
3182 +
3183 +// Returns:
3184 +// -1 Phase 1 decode error
3185 +// 0 OK
3186 +// >0 Out of space (bitmask)
3187 +
3188 +#define STATUS_COEFF_EXHAUSTED 8
3189 +#define STATUS_PU_EXHAUSTED 16
3190 +
3191 +static int check_status(const struct rpivid_dev *const dev)
3192 +{
3193 + const u32 cfstatus = apb_read(dev, RPI_CFSTATUS);
3194 + const u32 cfnum = apb_read(dev, RPI_CFNUM);
3195 + u32 status = apb_read(dev, RPI_STATUS);
3196 +
3197 + // Handle PU and COEFF stream overflow
3198 +
3199 + // this is the definition of successful completion of phase 1
3200 + // it assures that status register is zero and all blocks in each tile
3201 + // have completed
3202 + if (cfstatus == cfnum)
3203 + return 0; //No error
3204 +
3205 + status &= (STATUS_PU_EXHAUSTED | STATUS_COEFF_EXHAUSTED);
3206 + if (status)
3207 + return status;
3208 +
3209 + return -1;
3210 +}
3211 +
3212 +static void phase2_cb(struct rpivid_dev *const dev, void *v)
3213 +{
3214 + struct rpivid_dec_env *const de = v;
3215 +
3216 + xtrace_in(dev, de);
3217 +
3218 + /* Done with buffers - allow new P1 */
3219 + rpivid_hw_irq_active1_enable_claim(dev, 1);
3220 +
3221 + v4l2_m2m_buf_done(de->frame_buf, VB2_BUF_STATE_DONE);
3222 + de->frame_buf = NULL;
3223 +
3224 +#if USE_REQUEST_PIN
3225 + media_request_unpin(de->req_pin);
3226 + de->req_pin = NULL;
3227 +#else
3228 + media_request_object_complete(de->req_obj);
3229 + de->req_obj = NULL;
3230 +#endif
3231 +
3232 + xtrace_ok(dev, de);
3233 + dec_env_delete(de);
3234 +}
3235 +
3236 +static void phase2_claimed(struct rpivid_dev *const dev, void *v)
3237 +{
3238 + struct rpivid_dec_env *const de = v;
3239 + unsigned int i;
3240 +
3241 + xtrace_in(dev, de);
3242 +
3243 + apb_write_vc_addr(dev, RPI_PURBASE, de->pu_base_vc);
3244 + apb_write_vc_len(dev, RPI_PURSTRIDE, de->pu_stride);
3245 + apb_write_vc_addr(dev, RPI_COEFFRBASE, de->coeff_base_vc);
3246 + apb_write_vc_len(dev, RPI_COEFFRSTRIDE, de->coeff_stride);
3247 +
3248 + apb_write_vc_addr(dev, RPI_OUTYBASE, de->frame_addr);
3249 + apb_write_vc_addr(dev, RPI_OUTCBASE,
3250 + de->frame_addr + de->frame_c_offset);
3251 + apb_write_vc_len(dev, RPI_OUTYSTRIDE, de->frame_stride);
3252 + apb_write_vc_len(dev, RPI_OUTCSTRIDE, de->frame_stride);
3253 +
3254 + // v4l2_info(&dev->v4l2_dev, "Frame: Y=%llx, C=%llx, Stride=%x\n",
3255 + // de->frame_addr, de->frame_addr + de->frame_c_offset,
3256 + // de->frame_stride);
3257 +
3258 + for (i = 0; i < 16; i++) {
3259 + // Strides are in fact unused but fill in anyway
3260 + apb_write_vc_addr(dev, 0x9000 + 16 * i, de->ref_addrs[i]);
3261 + apb_write_vc_len(dev, 0x9004 + 16 * i, de->frame_stride);
3262 + apb_write_vc_addr(dev, 0x9008 + 16 * i,
3263 + de->ref_addrs[i] + de->frame_c_offset);
3264 + apb_write_vc_len(dev, 0x900C + 16 * i, de->frame_stride);
3265 + }
3266 +
3267 + apb_write(dev, RPI_CONFIG2, de->rpi_config2);
3268 + apb_write(dev, RPI_FRAMESIZE, de->rpi_framesize);
3269 + apb_write(dev, RPI_CURRPOC, de->rpi_currpoc);
3270 + // v4l2_info(&dev->v4l2_dev, "Config2=%#x, FrameSize=%#x, POC=%#x\n",
3271 + // de->rpi_config2, de->rpi_framesize, de->rpi_currpoc);
3272 +
3273 + // collocated reads/writes
3274 + apb_write_vc_len(dev, RPI_COLSTRIDE,
3275 + de->ctx->colmv_stride); // Read vals
3276 + apb_write_vc_len(dev, RPI_MVSTRIDE,
3277 + de->ctx->colmv_stride); // Write vals
3278 + apb_write_vc_addr(dev, RPI_MVBASE,
3279 + !de->frame_aux ? 0 : de->frame_aux->col.addr);
3280 + apb_write_vc_addr(dev, RPI_COLBASE,
3281 + !de->col_aux ? 0 : de->col_aux->col.addr);
3282 +
3283 + //v4l2_info(&dev->v4l2_dev,
3284 + // "Mv=%llx, Col=%llx, Stride=%x, Buf=%llx->%llx\n",
3285 + // de->rpi_mvbase, de->rpi_colbase, de->ctx->colmv_stride,
3286 + // de->ctx->colmvbuf.addr, de->ctx->colmvbuf.addr +
3287 + // de->ctx->colmvbuf.size);
3288 +
3289 + rpivid_hw_irq_active2_irq(dev, &de->irq_ent, phase2_cb, de);
3290 +
3291 + apb_write_final(dev, RPI_NUMROWS, de->pic_height_in_ctbs_y);
3292 +
3293 + xtrace_ok(dev, de);
3294 +}
3295 +
3296 +static void phase1_claimed(struct rpivid_dev *const dev, void *v);
3297 +
3298 +// release any and all objects associated with de
3299 +// and reenable phase 1 if required
3300 +static void phase1_err_fin(struct rpivid_dev *const dev,
3301 + struct rpivid_ctx *const ctx,
3302 + struct rpivid_dec_env *const de)
3303 +{
3304 + /* Return all detached buffers */
3305 + if (de->src_buf)
3306 + v4l2_m2m_buf_done(de->src_buf, VB2_BUF_STATE_ERROR);
3307 + de->src_buf = NULL;
3308 + if (de->frame_buf)
3309 + v4l2_m2m_buf_done(de->frame_buf, VB2_BUF_STATE_ERROR);
3310 + de->frame_buf = NULL;
3311 +#if USE_REQUEST_PIN
3312 + if (de->req_pin)
3313 + media_request_unpin(de->req_pin);
3314 + de->req_pin = NULL;
3315 +#else
3316 + if (de->req_obj)
3317 + media_request_object_complete(de->req_obj);
3318 + de->req_obj = NULL;
3319 +#endif
3320 +
3321 + dec_env_delete(de);
3322 +
3323 + /* Reenable phase 0 if we were blocking */
3324 + if (atomic_add_return(-1, &ctx->p1out) >= RPIVID_P1BUF_COUNT - 1)
3325 + v4l2_m2m_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx);
3326 +
3327 + /* Done with P1-P2 buffers - allow new P1 */
3328 + rpivid_hw_irq_active1_enable_claim(dev, 1);
3329 +}
3330 +
3331 +static void phase1_thread(struct rpivid_dev *const dev, void *v)
3332 +{
3333 + struct rpivid_dec_env *const de = v;
3334 + struct rpivid_ctx *const ctx = de->ctx;
3335 +
3336 + struct rpivid_gptr *const pu_gptr = ctx->pu_bufs + ctx->p2idx;
3337 + struct rpivid_gptr *const coeff_gptr = ctx->coeff_bufs + ctx->p2idx;
3338 +
3339 + xtrace_in(dev, de);
3340 +
3341 + if (de->p1_status & STATUS_PU_EXHAUSTED) {
3342 + if (gptr_realloc_new(dev, pu_gptr, next_size(pu_gptr->size))) {
3343 + v4l2_err(&dev->v4l2_dev,
3344 + "%s: PU realloc (%zx) failed\n",
3345 + __func__, pu_gptr->size);
3346 + goto fail;
3347 + }
3348 + v4l2_info(&dev->v4l2_dev, "%s: PU realloc (%zx) OK\n",
3349 + __func__, pu_gptr->size);
3350 + }
3351 +
3352 + if (de->p1_status & STATUS_COEFF_EXHAUSTED) {
3353 + if (gptr_realloc_new(dev, coeff_gptr,
3354 + next_size(coeff_gptr->size))) {
3355 + v4l2_err(&dev->v4l2_dev,
3356 + "%s: Coeff realloc (%zx) failed\n",
3357 + __func__, coeff_gptr->size);
3358 + goto fail;
3359 + }
3360 + v4l2_info(&dev->v4l2_dev, "%s: Coeff realloc (%zx) OK\n",
3361 + __func__, coeff_gptr->size);
3362 + }
3363 +
3364 + phase1_claimed(dev, de);
3365 + xtrace_ok(dev, de);
3366 + return;
3367 +
3368 +fail:
3369 + if (!pu_gptr->addr || !coeff_gptr->addr) {
3370 + v4l2_err(&dev->v4l2_dev,
3371 + "%s: Fatal: failed to reclaim old alloc\n",
3372 + __func__);
3373 + ctx->fatal_err = 1;
3374 + }
3375 + xtrace_fail(dev, de);
3376 + phase1_err_fin(dev, ctx, de);
3377 +}
3378 +
3379 +/* Always called in irq context (this is good) */
3380 +static void phase1_cb(struct rpivid_dev *const dev, void *v)
3381 +{
3382 + struct rpivid_dec_env *const de = v;
3383 + struct rpivid_ctx *const ctx = de->ctx;
3384 +
3385 + xtrace_in(dev, de);
3386 +
3387 + de->p1_status = check_status(dev);
3388 +
3389 + if (de->p1_status != 0) {
3390 + v4l2_info(&dev->v4l2_dev, "%s: Post wait: %#x\n",
3391 + __func__, de->p1_status);
3392 +
3393 + if (de->p1_status < 0)
3394 + goto fail;
3395 +
3396 + /* Need to realloc - push onto a thread rather than IRQ */
3397 + rpivid_hw_irq_active1_thread(dev, &de->irq_ent,
3398 + phase1_thread, de);
3399 + return;
3400 + }
3401 +
3402 + v4l2_m2m_buf_done(de->src_buf, VB2_BUF_STATE_DONE);
3403 + de->src_buf = NULL;
3404 +
3405 + /* All phase1 error paths done - it is safe to inc p2idx */
3406 + ctx->p2idx =
3407 + (ctx->p2idx + 1 >= RPIVID_P2BUF_COUNT) ? 0 : ctx->p2idx + 1;
3408 +
3409 + /* Renable the next setup if we were blocking */
3410 + if (atomic_add_return(-1, &ctx->p1out) >= RPIVID_P1BUF_COUNT - 1) {
3411 + xtrace_fin(dev, de);
3412 + v4l2_m2m_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx);
3413 + }
3414 +
3415 + rpivid_hw_irq_active2_claim(dev, &de->irq_ent, phase2_claimed, de);
3416 +
3417 + xtrace_ok(dev, de);
3418 + return;
3419 +
3420 +fail:
3421 + xtrace_fail(dev, de);
3422 + phase1_err_fin(dev, ctx, de);
3423 +}
3424 +
3425 +static void phase1_claimed(struct rpivid_dev *const dev, void *v)
3426 +{
3427 + struct rpivid_dec_env *const de = v;
3428 + struct rpivid_ctx *const ctx = de->ctx;
3429 +
3430 + const struct rpivid_gptr * const pu_gptr = ctx->pu_bufs + ctx->p2idx;
3431 + const struct rpivid_gptr * const coeff_gptr = ctx->coeff_bufs +
3432 + ctx->p2idx;
3433 +
3434 + xtrace_in(dev, de);
3435 +
3436 + if (ctx->fatal_err)
3437 + goto fail;
3438 +
3439 + de->pu_base_vc = pu_gptr->addr;
3440 + de->pu_stride =
3441 + ALIGN_DOWN(pu_gptr->size / de->pic_height_in_ctbs_y, 64);
3442 +
3443 + de->coeff_base_vc = coeff_gptr->addr;
3444 + de->coeff_stride =
3445 + ALIGN_DOWN(coeff_gptr->size / de->pic_height_in_ctbs_y, 64);
3446 +
3447 + /* phase1_claimed blocked until cb_phase1 completed so p2idx inc
3448 + * in cb_phase1 after error detection
3449 + */
3450 +
3451 + apb_write_vc_addr(dev, RPI_PUWBASE, de->pu_base_vc);
3452 + apb_write_vc_len(dev, RPI_PUWSTRIDE, de->pu_stride);
3453 + apb_write_vc_addr(dev, RPI_COEFFWBASE, de->coeff_base_vc);
3454 + apb_write_vc_len(dev, RPI_COEFFWSTRIDE, de->coeff_stride);
3455 +
3456 + // Trigger command FIFO
3457 + apb_write(dev, RPI_CFNUM, de->cmd_len);
3458 +
3459 + // Claim irq
3460 + rpivid_hw_irq_active1_irq(dev, &de->irq_ent, phase1_cb, de);
3461 +
3462 + // And start the h/w
3463 + apb_write_vc_addr_final(dev, RPI_CFBASE, de->cmd_addr);
3464 +
3465 + xtrace_ok(dev, de);
3466 + return;
3467 +
3468 +fail:
3469 + xtrace_fail(dev, de);
3470 + phase1_err_fin(dev, ctx, de);
3471 +}
3472 +
3473 +static void dec_state_delete(struct rpivid_ctx *const ctx)
3474 +{
3475 + unsigned int i;
3476 + struct rpivid_dec_state *const s = ctx->state;
3477 +
3478 + if (!s)
3479 + return;
3480 + ctx->state = NULL;
3481 +
3482 + free_ps_info(s);
3483 +
3484 + for (i = 0; i != HEVC_MAX_REFS; ++i)
3485 + aux_q_release(ctx, &s->ref_aux[i]);
3486 + aux_q_release(ctx, &s->frame_aux);
3487 +
3488 + kfree(s);
3489 +}
3490 +
3491 +struct irq_sync {
3492 + atomic_t done;
3493 + wait_queue_head_t wq;
3494 + struct rpivid_hw_irq_ent irq_ent;
3495 +};
3496 +
3497 +static void phase2_sync_claimed(struct rpivid_dev *const dev, void *v)
3498 +{
3499 + struct irq_sync *const sync = v;
3500 +
3501 + atomic_set(&sync->done, 1);
3502 + wake_up(&sync->wq);
3503 +}
3504 +
3505 +static void phase1_sync_claimed(struct rpivid_dev *const dev, void *v)
3506 +{
3507 + struct irq_sync *const sync = v;
3508 +
3509 + rpivid_hw_irq_active1_enable_claim(dev, 1);
3510 + rpivid_hw_irq_active2_claim(dev, &sync->irq_ent, phase2_sync_claimed, sync);
3511 +}
3512 +
3513 +/* Sync with IRQ operations
3514 + *
3515 + * Claims phase1 and phase2 in turn and waits for the phase2 claim so any
3516 + * pending IRQ ops will have completed by the time this returns
3517 + *
3518 + * phase1 has counted enables so must reenable once claimed
3519 + * phase2 has unlimited enables
3520 + */
3521 +static void irq_sync(struct rpivid_dev *const dev)
3522 +{
3523 + struct irq_sync sync;
3524 +
3525 + atomic_set(&sync.done, 0);
3526 + init_waitqueue_head(&sync.wq);
3527 +
3528 + rpivid_hw_irq_active1_claim(dev, &sync.irq_ent, phase1_sync_claimed, &sync);
3529 + wait_event(sync.wq, atomic_read(&sync.done));
3530 +}
3531 +
3532 +static void h265_ctx_uninit(struct rpivid_dev *const dev, struct rpivid_ctx *ctx)
3533 +{
3534 + unsigned int i;
3535 +
3536 + dec_env_uninit(ctx);
3537 + dec_state_delete(ctx);
3538 +
3539 + // dec_env & state must be killed before this to release the buffer to
3540 + // the free pool
3541 + aux_q_uninit(ctx);
3542 +
3543 + for (i = 0; i != ARRAY_SIZE(ctx->bitbufs); ++i)
3544 + gptr_free(dev, ctx->bitbufs + i);
3545 + for (i = 0; i != ARRAY_SIZE(ctx->pu_bufs); ++i)
3546 + gptr_free(dev, ctx->pu_bufs + i);
3547 + for (i = 0; i != ARRAY_SIZE(ctx->coeff_bufs); ++i)
3548 + gptr_free(dev, ctx->coeff_bufs + i);
3549 +}
3550 +
3551 +static void rpivid_h265_stop(struct rpivid_ctx *ctx)
3552 +{
3553 + struct rpivid_dev *const dev = ctx->dev;
3554 +
3555 + v4l2_info(&dev->v4l2_dev, "%s\n", __func__);
3556 +
3557 + irq_sync(dev);
3558 + h265_ctx_uninit(dev, ctx);
3559 +}
3560 +
3561 +static int rpivid_h265_start(struct rpivid_ctx *ctx)
3562 +{
3563 + struct rpivid_dev *const dev = ctx->dev;
3564 + unsigned int i;
3565 +
3566 + unsigned int w = ctx->dst_fmt.width;
3567 + unsigned int h = ctx->dst_fmt.height;
3568 + unsigned int wxh;
3569 + size_t pu_alloc;
3570 + size_t coeff_alloc;
3571 +
3572 +#if DEBUG_TRACE_P1_CMD
3573 + p1_z = 0;
3574 +#endif
3575 +
3576 + // Generate a sanitised WxH for memory alloc
3577 + // Assume HD if unset
3578 + if (w == 0)
3579 + w = 1920;
3580 + if (w > 4096)
3581 + w = 4096;
3582 + if (h == 0)
3583 + h = 1088;
3584 + if (h > 4096)
3585 + h = 4096;
3586 + wxh = w * h;
3587 +
3588 + v4l2_info(&dev->v4l2_dev, "%s: (%dx%d)\n", __func__,
3589 + ctx->dst_fmt.width, ctx->dst_fmt.height);
3590 +
3591 + ctx->fatal_err = 0;
3592 + ctx->dec0 = NULL;
3593 + ctx->state = kzalloc(sizeof(*ctx->state), GFP_KERNEL);
3594 + if (!ctx->state) {
3595 + v4l2_err(&dev->v4l2_dev, "Failed to allocate decode state\n");
3596 + goto fail;
3597 + }
3598 +
3599 + if (dec_env_init(ctx) != 0) {
3600 + v4l2_err(&dev->v4l2_dev, "Failed to allocate decode envs\n");
3601 + goto fail;
3602 + }
3603 +
3604 + // Finger in the air PU & Coeff alloc
3605 + // Will be realloced if too small
3606 + coeff_alloc = rpivid_round_up_size(wxh);
3607 + pu_alloc = rpivid_round_up_size(wxh / 4);
3608 + for (i = 0; i != ARRAY_SIZE(ctx->pu_bufs); ++i) {
3609 + // Don't actually need a kernel mapping here
3610 + if (gptr_alloc(dev, ctx->pu_bufs + i, pu_alloc,
3611 + DMA_ATTR_NO_KERNEL_MAPPING))
3612 + goto fail;
3613 + if (gptr_alloc(dev, ctx->coeff_bufs + i, coeff_alloc,
3614 + DMA_ATTR_NO_KERNEL_MAPPING))
3615 + goto fail;
3616 + }
3617 + aux_q_init(ctx);
3618 +
3619 + return 0;
3620 +
3621 +fail:
3622 + h265_ctx_uninit(dev, ctx);
3623 + return -ENOMEM;
3624 +}
3625 +
3626 +static void rpivid_h265_trigger(struct rpivid_ctx *ctx)
3627 +{
3628 + struct rpivid_dev *const dev = ctx->dev;
3629 + struct rpivid_dec_env *const de = ctx->dec0;
3630 +
3631 + xtrace_in(dev, de);
3632 +
3633 + switch (!de ? RPIVID_DECODE_ERROR_CONTINUE : de->state) {
3634 + case RPIVID_DECODE_SLICE_START:
3635 + de->state = RPIVID_DECODE_SLICE_CONTINUE;
3636 + fallthrough;
3637 + case RPIVID_DECODE_SLICE_CONTINUE:
3638 + v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx,
3639 + VB2_BUF_STATE_DONE);
3640 + xtrace_ok(dev, de);
3641 + break;
3642 +
3643 + default:
3644 + v4l2_err(&dev->v4l2_dev, "%s: Unexpected state: %d\n", __func__,
3645 + de->state);
3646 + fallthrough;
3647 + case RPIVID_DECODE_ERROR_DONE:
3648 + ctx->dec0 = NULL;
3649 + dec_env_delete(de);
3650 + fallthrough;
3651 + case RPIVID_DECODE_ERROR_CONTINUE:
3652 + xtrace_fin(dev, de);
3653 + v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx,
3654 + VB2_BUF_STATE_ERROR);
3655 + break;
3656 +
3657 + case RPIVID_DECODE_PHASE1:
3658 + ctx->dec0 = NULL;
3659 +
3660 +#if !USE_REQUEST_PIN
3661 + /* Alloc a new request object - needs to be alloced dynamically
3662 + * as the media request will release it some random time after
3663 + * it is completed
3664 + */
3665 + de->req_obj = kmalloc(sizeof(*de->req_obj), GFP_KERNEL);
3666 + if (!de->req_obj) {
3667 + xtrace_fail(dev, de);
3668 + dec_env_delete(de);
3669 + v4l2_m2m_buf_done_and_job_finish(dev->m2m_dev,
3670 + ctx->fh.m2m_ctx,
3671 + VB2_BUF_STATE_ERROR);
3672 + break;
3673 + }
3674 + media_request_object_init(de->req_obj);
3675 +#warning probably needs to _get the req obj too
3676 +#endif
3677 + ctx->p1idx = (ctx->p1idx + 1 >= RPIVID_P1BUF_COUNT) ?
3678 + 0 : ctx->p1idx + 1;
3679 +
3680 + /* We know we have src & dst so no need to test */
3681 + de->src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
3682 + de->frame_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
3683 +
3684 +#if USE_REQUEST_PIN
3685 + de->req_pin = de->src_buf->vb2_buf.req_obj.req;
3686 + media_request_pin(de->req_pin);
3687 +#else
3688 + media_request_object_bind(de->src_buf->vb2_buf.req_obj.req,
3689 + &dst_req_obj_ops, de, false,
3690 + de->req_obj);
3691 +#endif
3692 +
3693 + /* We could get rid of the src buffer here if we've already
3694 + * copied it, but we don't copy the last buffer unless it
3695 + * didn't return a contig dma addr and that shouldn't happen
3696 + */
3697 +
3698 + /* Enable the next setup if our Q isn't too big */
3699 + if (atomic_add_return(1, &ctx->p1out) < RPIVID_P1BUF_COUNT) {
3700 + xtrace_fin(dev, de);
3701 + v4l2_m2m_job_finish(dev->m2m_dev, ctx->fh.m2m_ctx);
3702 + }
3703 +
3704 + rpivid_hw_irq_active1_claim(dev, &de->irq_ent, phase1_claimed,
3705 + de);
3706 + xtrace_ok(dev, de);
3707 + break;
3708 + }
3709 +}
3710 +
3711 +const struct rpivid_dec_ops rpivid_dec_ops_h265 = {
3712 + .setup = rpivid_h265_setup,
3713 + .start = rpivid_h265_start,
3714 + .stop = rpivid_h265_stop,
3715 + .trigger = rpivid_h265_trigger,
3716 +};
3717 +
3718 +static int try_ctrl_sps(struct v4l2_ctrl *ctrl)
3719 +{
3720 + const struct v4l2_ctrl_hevc_sps *const sps = ctrl->p_new.p_hevc_sps;
3721 + struct rpivid_ctx *const ctx = ctrl->priv;
3722 + struct rpivid_dev *const dev = ctx->dev;
3723 +
3724 + if (sps->chroma_format_idc != 1) {
3725 + v4l2_warn(&dev->v4l2_dev,
3726 + "Chroma format (%d) unsupported\n",
3727 + sps->chroma_format_idc);
3728 + return -EINVAL;
3729 + }
3730 +
3731 + if (sps->bit_depth_luma_minus8 != 0 &&
3732 + sps->bit_depth_luma_minus8 != 2) {
3733 + v4l2_warn(&dev->v4l2_dev,
3734 + "Luma depth (%d) unsupported\n",
3735 + sps->bit_depth_luma_minus8 + 8);
3736 + return -EINVAL;
3737 + }
3738 +
3739 + if (sps->bit_depth_luma_minus8 != sps->bit_depth_chroma_minus8) {
3740 + v4l2_warn(&dev->v4l2_dev,
3741 + "Chroma depth (%d) != Luma depth (%d)\n",
3742 + sps->bit_depth_chroma_minus8 + 8,
3743 + sps->bit_depth_luma_minus8 + 8);
3744 + return -EINVAL;
3745 + }
3746 +
3747 + if (!sps->pic_width_in_luma_samples ||
3748 + !sps->pic_height_in_luma_samples ||
3749 + sps->pic_width_in_luma_samples > 4096 ||
3750 + sps->pic_height_in_luma_samples > 4096) {
3751 + v4l2_warn(&dev->v4l2_dev,
3752 + "Bad sps width (%u) x height (%u)\n",
3753 + sps->pic_width_in_luma_samples,
3754 + sps->pic_height_in_luma_samples);
3755 + return -EINVAL;
3756 + }
3757 +
3758 + if (!ctx->dst_fmt_set)
3759 + return 0;
3760 +
3761 + if ((sps->bit_depth_luma_minus8 == 0 &&
3762 + ctx->dst_fmt.pixelformat != V4L2_PIX_FMT_NV12_COL128) ||
3763 + (sps->bit_depth_luma_minus8 == 2 &&
3764 + ctx->dst_fmt.pixelformat != V4L2_PIX_FMT_NV12_10_COL128)) {
3765 + v4l2_warn(&dev->v4l2_dev,
3766 + "SPS luma depth %d does not match capture format\n",
3767 + sps->bit_depth_luma_minus8 + 8);
3768 + return -EINVAL;
3769 + }
3770 +
3771 + if (sps->pic_width_in_luma_samples > ctx->dst_fmt.width ||
3772 + sps->pic_height_in_luma_samples > ctx->dst_fmt.height) {
3773 + v4l2_warn(&dev->v4l2_dev,
3774 + "SPS size (%dx%d) > capture size (%d,%d)\n",
3775 + sps->pic_width_in_luma_samples,
3776 + sps->pic_height_in_luma_samples,
3777 + ctx->dst_fmt.width,
3778 + ctx->dst_fmt.height);
3779 + return -EINVAL;
3780 + }
3781 +
3782 + return 0;
3783 +}
3784 +
3785 +const struct v4l2_ctrl_ops rpivid_hevc_sps_ctrl_ops = {
3786 + .try_ctrl = try_ctrl_sps,
3787 +};
3788 +
3789 +static int try_ctrl_pps(struct v4l2_ctrl *ctrl)
3790 +{
3791 + const struct v4l2_ctrl_hevc_pps *const pps = ctrl->p_new.p_hevc_pps;
3792 + struct rpivid_ctx *const ctx = ctrl->priv;
3793 + struct rpivid_dev *const dev = ctx->dev;
3794 +
3795 + if ((pps->flags &
3796 + V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED) &&
3797 + (pps->flags &
3798 + V4L2_HEVC_PPS_FLAG_TILES_ENABLED) &&
3799 + (pps->num_tile_columns_minus1 || pps->num_tile_rows_minus1)) {
3800 + v4l2_warn(&dev->v4l2_dev,
3801 + "WPP + Tiles not supported\n");
3802 + return -EINVAL;
3803 + }
3804 +
3805 + return 0;
3806 +}
3807 +
3808 +const struct v4l2_ctrl_ops rpivid_hevc_pps_ctrl_ops = {
3809 + .try_ctrl = try_ctrl_pps,
3810 +};
3811 +
3812 --- /dev/null
3813 +++ b/drivers/staging/media/rpivid/rpivid_hw.c
3814 @@ -0,0 +1,383 @@
3815 +// SPDX-License-Identifier: GPL-2.0
3816 +/*
3817 + * Raspberry Pi HEVC driver
3818 + *
3819 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
3820 + *
3821 + * Based on the Cedrus VPU driver, that is:
3822 + *
3823 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
3824 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
3825 + * Copyright (C) 2018 Bootlin
3826 + */
3827 +#include <linux/clk.h>
3828 +#include <linux/component.h>
3829 +#include <linux/dma-mapping.h>
3830 +#include <linux/interrupt.h>
3831 +#include <linux/io.h>
3832 +#include <linux/of_reserved_mem.h>
3833 +#include <linux/of_device.h>
3834 +#include <linux/of_platform.h>
3835 +#include <linux/platform_device.h>
3836 +#include <linux/regmap.h>
3837 +#include <linux/reset.h>
3838 +
3839 +#include <media/videobuf2-core.h>
3840 +#include <media/v4l2-mem2mem.h>
3841 +
3842 +#include <soc/bcm2835/raspberrypi-firmware.h>
3843 +
3844 +#include "rpivid.h"
3845 +#include "rpivid_hw.h"
3846 +
3847 +static void pre_irq(struct rpivid_dev *dev, struct rpivid_hw_irq_ent *ient,
3848 + rpivid_irq_callback cb, void *v,
3849 + struct rpivid_hw_irq_ctrl *ictl)
3850 +{
3851 + unsigned long flags;
3852 +
3853 + if (ictl->irq) {
3854 + v4l2_err(&dev->v4l2_dev, "Attempt to claim IRQ when already claimed\n");
3855 + return;
3856 + }
3857 +
3858 + ient->cb = cb;
3859 + ient->v = v;
3860 +
3861 + spin_lock_irqsave(&ictl->lock, flags);
3862 + ictl->irq = ient;
3863 + ictl->no_sched++;
3864 + spin_unlock_irqrestore(&ictl->lock, flags);
3865 +}
3866 +
3867 +/* Should be called from inside ictl->lock */
3868 +static inline bool sched_enabled(const struct rpivid_hw_irq_ctrl * const ictl)
3869 +{
3870 + return ictl->no_sched <= 0 && ictl->enable;
3871 +}
3872 +
3873 +/* Should be called from inside ictl->lock & after checking sched_enabled() */
3874 +static inline void set_claimed(struct rpivid_hw_irq_ctrl * const ictl)
3875 +{
3876 + if (ictl->enable > 0)
3877 + --ictl->enable;
3878 + ictl->no_sched = 1;
3879 +}
3880 +
3881 +/* Should be called from inside ictl->lock */
3882 +static struct rpivid_hw_irq_ent *get_sched(struct rpivid_hw_irq_ctrl * const ictl)
3883 +{
3884 + struct rpivid_hw_irq_ent *ient;
3885 +
3886 + if (!sched_enabled(ictl))
3887 + return NULL;
3888 +
3889 + ient = ictl->claim;
3890 + if (!ient)
3891 + return NULL;
3892 + ictl->claim = ient->next;
3893 +
3894 + set_claimed(ictl);
3895 + return ient;
3896 +}
3897 +
3898 +/* Run a callback & check to see if there is anything else to run */
3899 +static void sched_cb(struct rpivid_dev * const dev,
3900 + struct rpivid_hw_irq_ctrl * const ictl,
3901 + struct rpivid_hw_irq_ent *ient)
3902 +{
3903 + while (ient) {
3904 + unsigned long flags;
3905 +
3906 + ient->cb(dev, ient->v);
3907 +
3908 + spin_lock_irqsave(&ictl->lock, flags);
3909 +
3910 + /* Always dec no_sched after cb exec - must have been set
3911 + * on entry to cb
3912 + */
3913 + --ictl->no_sched;
3914 + ient = get_sched(ictl);
3915 +
3916 + spin_unlock_irqrestore(&ictl->lock, flags);
3917 + }
3918 +}
3919 +
3920 +/* Should only ever be called from its own IRQ cb so no lock required */
3921 +static void pre_thread(struct rpivid_dev *dev,
3922 + struct rpivid_hw_irq_ent *ient,
3923 + rpivid_irq_callback cb, void *v,
3924 + struct rpivid_hw_irq_ctrl *ictl)
3925 +{
3926 + ient->cb = cb;
3927 + ient->v = v;
3928 + ictl->irq = ient;
3929 + ictl->thread_reqed = true;
3930 + ictl->no_sched++; /* This is unwound in do_thread */
3931 +}
3932 +
3933 +// Called in irq context
3934 +static void do_irq(struct rpivid_dev * const dev,
3935 + struct rpivid_hw_irq_ctrl * const ictl)
3936 +{
3937 + struct rpivid_hw_irq_ent *ient;
3938 + unsigned long flags;
3939 +
3940 + spin_lock_irqsave(&ictl->lock, flags);
3941 + ient = ictl->irq;
3942 + ictl->irq = NULL;
3943 + spin_unlock_irqrestore(&ictl->lock, flags);
3944 +
3945 + sched_cb(dev, ictl, ient);
3946 +}
3947 +
3948 +static void do_claim(struct rpivid_dev * const dev,
3949 + struct rpivid_hw_irq_ent *ient,
3950 + const rpivid_irq_callback cb, void * const v,
3951 + struct rpivid_hw_irq_ctrl * const ictl)
3952 +{
3953 + unsigned long flags;
3954 +
3955 + ient->next = NULL;
3956 + ient->cb = cb;
3957 + ient->v = v;
3958 +
3959 + spin_lock_irqsave(&ictl->lock, flags);
3960 +
3961 + if (ictl->claim) {
3962 + // If we have a Q then add to end
3963 + ictl->tail->next = ient;
3964 + ictl->tail = ient;
3965 + ient = NULL;
3966 + } else if (!sched_enabled(ictl)) {
3967 + // Empty Q but other activity in progress so Q
3968 + ictl->claim = ient;
3969 + ictl->tail = ient;
3970 + ient = NULL;
3971 + } else {
3972 + // Nothing else going on - schedule immediately and
3973 + // prevent anything else scheduling claims
3974 + set_claimed(ictl);
3975 + }
3976 +
3977 + spin_unlock_irqrestore(&ictl->lock, flags);
3978 +
3979 + sched_cb(dev, ictl, ient);
3980 +}
3981 +
3982 +/* Enable n claims.
3983 + * n < 0 set to unlimited (default on init)
3984 + * n = 0 if previously unlimited then disable otherwise nop
3985 + * n > 0 if previously unlimited then set to n enables
3986 + * otherwise add n enables
3987 + * The enable count is automatically decremented every time a claim is run
3988 + */
3989 +static void do_enable_claim(struct rpivid_dev * const dev,
3990 + int n,
3991 + struct rpivid_hw_irq_ctrl * const ictl)
3992 +{
3993 + unsigned long flags;
3994 + struct rpivid_hw_irq_ent *ient;
3995 +
3996 + spin_lock_irqsave(&ictl->lock, flags);
3997 + ictl->enable = n < 0 ? -1 : ictl->enable <= 0 ? n : ictl->enable + n;
3998 + ient = get_sched(ictl);
3999 + spin_unlock_irqrestore(&ictl->lock, flags);
4000 +
4001 + sched_cb(dev, ictl, ient);
4002 +}
4003 +
4004 +static void ictl_init(struct rpivid_hw_irq_ctrl * const ictl, int enables)
4005 +{
4006 + spin_lock_init(&ictl->lock);
4007 + ictl->claim = NULL;
4008 + ictl->tail = NULL;
4009 + ictl->irq = NULL;
4010 + ictl->no_sched = 0;
4011 + ictl->enable = enables;
4012 + ictl->thread_reqed = false;
4013 +}
4014 +
4015 +static void ictl_uninit(struct rpivid_hw_irq_ctrl * const ictl)
4016 +{
4017 + // Nothing to do
4018 +}
4019 +
4020 +#if !OPT_DEBUG_POLL_IRQ
4021 +static irqreturn_t rpivid_irq_irq(int irq, void *data)
4022 +{
4023 + struct rpivid_dev * const dev = data;
4024 + __u32 ictrl;
4025 +
4026 + ictrl = irq_read(dev, ARG_IC_ICTRL);
4027 + if (!(ictrl & ARG_IC_ICTRL_ALL_IRQ_MASK)) {
4028 + v4l2_warn(&dev->v4l2_dev, "IRQ but no IRQ bits set\n");
4029 + return IRQ_NONE;
4030 + }
4031 +
4032 + // Cancel any/all irqs
4033 + irq_write(dev, ARG_IC_ICTRL, ictrl & ~ARG_IC_ICTRL_SET_ZERO_MASK);
4034 +
4035 + // Service Active2 before Active1 so Phase 1 can transition to Phase 2
4036 + // without delay
4037 + if (ictrl & ARG_IC_ICTRL_ACTIVE2_INT_SET)
4038 + do_irq(dev, &dev->ic_active2);
4039 + if (ictrl & ARG_IC_ICTRL_ACTIVE1_INT_SET)
4040 + do_irq(dev, &dev->ic_active1);
4041 +
4042 + return dev->ic_active1.thread_reqed || dev->ic_active2.thread_reqed ?
4043 + IRQ_WAKE_THREAD : IRQ_HANDLED;
4044 +}
4045 +
4046 +static void do_thread(struct rpivid_dev * const dev,
4047 + struct rpivid_hw_irq_ctrl *const ictl)
4048 +{
4049 + unsigned long flags;
4050 + struct rpivid_hw_irq_ent *ient = NULL;
4051 +
4052 + spin_lock_irqsave(&ictl->lock, flags);
4053 +
4054 + if (ictl->thread_reqed) {
4055 + ient = ictl->irq;
4056 + ictl->thread_reqed = false;
4057 + ictl->irq = NULL;
4058 + }
4059 +
4060 + spin_unlock_irqrestore(&ictl->lock, flags);
4061 +
4062 + sched_cb(dev, ictl, ient);
4063 +}
4064 +
4065 +static irqreturn_t rpivid_irq_thread(int irq, void *data)
4066 +{
4067 + struct rpivid_dev * const dev = data;
4068 +
4069 + do_thread(dev, &dev->ic_active1);
4070 + do_thread(dev, &dev->ic_active2);
4071 +
4072 + return IRQ_HANDLED;
4073 +}
4074 +#endif
4075 +
4076 +/* May only be called from Active1 CB
4077 + * IRQs should not be expected until execution continues in the cb
4078 + */
4079 +void rpivid_hw_irq_active1_thread(struct rpivid_dev *dev,
4080 + struct rpivid_hw_irq_ent *ient,
4081 + rpivid_irq_callback thread_cb, void *ctx)
4082 +{
4083 + pre_thread(dev, ient, thread_cb, ctx, &dev->ic_active1);
4084 +}
4085 +
4086 +void rpivid_hw_irq_active1_enable_claim(struct rpivid_dev *dev,
4087 + int n)
4088 +{
4089 + do_enable_claim(dev, n, &dev->ic_active1);
4090 +}
4091 +
4092 +void rpivid_hw_irq_active1_claim(struct rpivid_dev *dev,
4093 + struct rpivid_hw_irq_ent *ient,
4094 + rpivid_irq_callback ready_cb, void *ctx)
4095 +{
4096 + do_claim(dev, ient, ready_cb, ctx, &dev->ic_active1);
4097 +}
4098 +
4099 +void rpivid_hw_irq_active1_irq(struct rpivid_dev *dev,
4100 + struct rpivid_hw_irq_ent *ient,
4101 + rpivid_irq_callback irq_cb, void *ctx)
4102 +{
4103 + pre_irq(dev, ient, irq_cb, ctx, &dev->ic_active1);
4104 +}
4105 +
4106 +void rpivid_hw_irq_active2_claim(struct rpivid_dev *dev,
4107 + struct rpivid_hw_irq_ent *ient,
4108 + rpivid_irq_callback ready_cb, void *ctx)
4109 +{
4110 + do_claim(dev, ient, ready_cb, ctx, &dev->ic_active2);
4111 +}
4112 +
4113 +void rpivid_hw_irq_active2_irq(struct rpivid_dev *dev,
4114 + struct rpivid_hw_irq_ent *ient,
4115 + rpivid_irq_callback irq_cb, void *ctx)
4116 +{
4117 + pre_irq(dev, ient, irq_cb, ctx, &dev->ic_active2);
4118 +}
4119 +
4120 +int rpivid_hw_probe(struct rpivid_dev *dev)
4121 +{
4122 + struct rpi_firmware *firmware;
4123 + struct device_node *node;
4124 + struct resource *res;
4125 + __u32 irq_stat;
4126 + int irq_dec;
4127 + int ret = 0;
4128 +
4129 + ictl_init(&dev->ic_active1, RPIVID_P2BUF_COUNT);
4130 + ictl_init(&dev->ic_active2, RPIVID_ICTL_ENABLE_UNLIMITED);
4131 +
4132 + res = platform_get_resource_byname(dev->pdev, IORESOURCE_MEM, "intc");
4133 + if (!res)
4134 + return -ENODEV;
4135 +
4136 + dev->base_irq = devm_ioremap(dev->dev, res->start, resource_size(res));
4137 + if (IS_ERR(dev->base_irq))
4138 + return PTR_ERR(dev->base_irq);
4139 +
4140 + res = platform_get_resource_byname(dev->pdev, IORESOURCE_MEM, "hevc");
4141 + if (!res)
4142 + return -ENODEV;
4143 +
4144 + dev->base_h265 = devm_ioremap(dev->dev, res->start, resource_size(res));
4145 + if (IS_ERR(dev->base_h265))
4146 + return PTR_ERR(dev->base_h265);
4147 +
4148 + dev->clock = devm_clk_get(&dev->pdev->dev, "hevc");
4149 + if (IS_ERR(dev->clock))
4150 + return PTR_ERR(dev->clock);
4151 +
4152 + node = rpi_firmware_find_node();
4153 + if (!node)
4154 + return -EINVAL;
4155 +
4156 + firmware = rpi_firmware_get(node);
4157 + of_node_put(node);
4158 + if (!firmware)
4159 + return -EPROBE_DEFER;
4160 +
4161 + dev->max_clock_rate = rpi_firmware_clk_get_max_rate(firmware,
4162 + RPI_FIRMWARE_HEVC_CLK_ID);
4163 + rpi_firmware_put(firmware);
4164 +
4165 + dev->cache_align = dma_get_cache_alignment();
4166 +
4167 + // Disable IRQs & reset anything pending
4168 + irq_write(dev, 0,
4169 + ARG_IC_ICTRL_ACTIVE1_EN_SET | ARG_IC_ICTRL_ACTIVE2_EN_SET);
4170 + irq_stat = irq_read(dev, 0);
4171 + irq_write(dev, 0, irq_stat);
4172 +
4173 +#if !OPT_DEBUG_POLL_IRQ
4174 + irq_dec = platform_get_irq(dev->pdev, 0);
4175 + if (irq_dec <= 0)
4176 + return irq_dec;
4177 + ret = devm_request_threaded_irq(dev->dev, irq_dec,
4178 + rpivid_irq_irq,
4179 + rpivid_irq_thread,
4180 + 0, dev_name(dev->dev), dev);
4181 + if (ret) {
4182 + dev_err(dev->dev, "Failed to request IRQ - %d\n", ret);
4183 +
4184 + return ret;
4185 + }
4186 +#endif
4187 + return ret;
4188 +}
4189 +
4190 +void rpivid_hw_remove(struct rpivid_dev *dev)
4191 +{
4192 + // IRQ auto freed on unload so no need to do it here
4193 + // ioremap auto freed on unload
4194 + ictl_uninit(&dev->ic_active1);
4195 + ictl_uninit(&dev->ic_active2);
4196 +}
4197 +
4198 --- /dev/null
4199 +++ b/drivers/staging/media/rpivid/rpivid_hw.h
4200 @@ -0,0 +1,303 @@
4201 +/* SPDX-License-Identifier: GPL-2.0 */
4202 +/*
4203 + * Raspberry Pi HEVC driver
4204 + *
4205 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
4206 + *
4207 + * Based on the Cedrus VPU driver, that is:
4208 + *
4209 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
4210 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
4211 + * Copyright (C) 2018 Bootlin
4212 + */
4213 +
4214 +#ifndef _RPIVID_HW_H_
4215 +#define _RPIVID_HW_H_
4216 +
4217 +struct rpivid_hw_irq_ent {
4218 + struct rpivid_hw_irq_ent *next;
4219 + rpivid_irq_callback cb;
4220 + void *v;
4221 +};
4222 +
4223 +/* Phase 1 Register offsets */
4224 +
4225 +#define RPI_SPS0 0
4226 +#define RPI_SPS1 4
4227 +#define RPI_PPS 8
4228 +#define RPI_SLICE 12
4229 +#define RPI_TILESTART 16
4230 +#define RPI_TILEEND 20
4231 +#define RPI_SLICESTART 24
4232 +#define RPI_MODE 28
4233 +#define RPI_LEFT0 32
4234 +#define RPI_LEFT1 36
4235 +#define RPI_LEFT2 40
4236 +#define RPI_LEFT3 44
4237 +#define RPI_QP 48
4238 +#define RPI_CONTROL 52
4239 +#define RPI_STATUS 56
4240 +#define RPI_VERSION 60
4241 +#define RPI_BFBASE 64
4242 +#define RPI_BFNUM 68
4243 +#define RPI_BFCONTROL 72
4244 +#define RPI_BFSTATUS 76
4245 +#define RPI_PUWBASE 80
4246 +#define RPI_PUWSTRIDE 84
4247 +#define RPI_COEFFWBASE 88
4248 +#define RPI_COEFFWSTRIDE 92
4249 +#define RPI_SLICECMDS 96
4250 +#define RPI_BEGINTILEEND 100
4251 +#define RPI_TRANSFER 104
4252 +#define RPI_CFBASE 108
4253 +#define RPI_CFNUM 112
4254 +#define RPI_CFSTATUS 116
4255 +
4256 +/* Phase 2 Register offsets */
4257 +
4258 +#define RPI_PURBASE 0x8000
4259 +#define RPI_PURSTRIDE 0x8004
4260 +#define RPI_COEFFRBASE 0x8008
4261 +#define RPI_COEFFRSTRIDE 0x800C
4262 +#define RPI_NUMROWS 0x8010
4263 +#define RPI_CONFIG2 0x8014
4264 +#define RPI_OUTYBASE 0x8018
4265 +#define RPI_OUTYSTRIDE 0x801C
4266 +#define RPI_OUTCBASE 0x8020
4267 +#define RPI_OUTCSTRIDE 0x8024
4268 +#define RPI_STATUS2 0x8028
4269 +#define RPI_FRAMESIZE 0x802C
4270 +#define RPI_MVBASE 0x8030
4271 +#define RPI_MVSTRIDE 0x8034
4272 +#define RPI_COLBASE 0x8038
4273 +#define RPI_COLSTRIDE 0x803C
4274 +#define RPI_CURRPOC 0x8040
4275 +
4276 +/*
4277 + * Write a general register value
4278 + * Order is unimportant
4279 + */
4280 +static inline void apb_write(const struct rpivid_dev * const dev,
4281 + const unsigned int offset, const u32 val)
4282 +{
4283 + writel_relaxed(val, dev->base_h265 + offset);
4284 +}
4285 +
4286 +/* Write the final register value that actually starts the phase */
4287 +static inline void apb_write_final(const struct rpivid_dev * const dev,
4288 + const unsigned int offset, const u32 val)
4289 +{
4290 + writel(val, dev->base_h265 + offset);
4291 +}
4292 +
4293 +static inline u32 apb_read(const struct rpivid_dev * const dev,
4294 + const unsigned int offset)
4295 +{
4296 + return readl(dev->base_h265 + offset);
4297 +}
4298 +
4299 +static inline void irq_write(const struct rpivid_dev * const dev,
4300 + const unsigned int offset, const u32 val)
4301 +{
4302 + writel(val, dev->base_irq + offset);
4303 +}
4304 +
4305 +static inline u32 irq_read(const struct rpivid_dev * const dev,
4306 + const unsigned int offset)
4307 +{
4308 + return readl(dev->base_irq + offset);
4309 +}
4310 +
4311 +static inline void apb_write_vc_addr(const struct rpivid_dev * const dev,
4312 + const unsigned int offset,
4313 + const dma_addr_t a)
4314 +{
4315 + apb_write(dev, offset, (u32)(a >> 6));
4316 +}
4317 +
4318 +static inline void apb_write_vc_addr_final(const struct rpivid_dev * const dev,
4319 + const unsigned int offset,
4320 + const dma_addr_t a)
4321 +{
4322 + apb_write_final(dev, offset, (u32)(a >> 6));
4323 +}
4324 +
4325 +static inline void apb_write_vc_len(const struct rpivid_dev * const dev,
4326 + const unsigned int offset,
4327 + const unsigned int x)
4328 +{
4329 + apb_write(dev, offset, (x + 63) >> 6);
4330 +}
4331 +
4332 +/* *ARG_IC_ICTRL - Interrupt control for ARGON Core*
4333 + * Offset (byte space) = 40'h2b10000
4334 + * Physical Address (byte space) = 40'h7eb10000
4335 + * Verilog Macro Address = `ARG_IC_REG_START + `ARGON_INTCTRL_ICTRL
4336 + * Reset Value = 32'b100x100x_100xxxxx_xxxxxxx0_x100x100
4337 + * Access = RW (32-bit only)
4338 + * Interrupt control logic for ARGON Core.
4339 + */
4340 +#define ARG_IC_ICTRL 0
4341 +
4342 +/* acc=LWC ACTIVE1_INT FIELD ACCESS: LWC
4343 + *
4344 + * Interrupt 1
4345 + * This is set and held when an hevc_active1 interrupt edge is detected
4346 + * The polarity of the edge is set by the ACTIVE1_EDGE field
4347 + * Write a 1 to this bit to clear down the latched interrupt
4348 + * The latched interrupt is only enabled out onto the interrupt line if
4349 + * ACTIVE1_EN is set
4350 + * Reset value is *0* decimal.
4351 + */
4352 +#define ARG_IC_ICTRL_ACTIVE1_INT_SET BIT(0)
4353 +
4354 +/* ACTIVE1_EDGE Sets the polarity of the interrupt edge detection logic
4355 + * This logic detects edges of the hevc_active1 line from the argon core
4356 + * 0 = negedge, 1 = posedge
4357 + * Reset value is *0* decimal.
4358 + */
4359 +#define ARG_IC_ICTRL_ACTIVE1_EDGE_SET BIT(1)
4360 +
4361 +/* ACTIVE1_EN Enables ACTIVE1_INT out onto the argon interrupt line.
4362 + * If this isn't set, the interrupt logic will work but no interrupt will be
4363 + * set to the interrupt controller
4364 + * Reset value is *1* decimal.
4365 + *
4366 + * [JC] The above appears to be a lie - if unset then b0 is never set
4367 + */
4368 +#define ARG_IC_ICTRL_ACTIVE1_EN_SET BIT(2)
4369 +
4370 +/* acc=RO ACTIVE1_STATUS FIELD ACCESS: RO
4371 + *
4372 + * The current status of the hevc_active1 signal
4373 + */
4374 +#define ARG_IC_ICTRL_ACTIVE1_STATUS_SET BIT(3)
4375 +
4376 +/* acc=LWC ACTIVE2_INT FIELD ACCESS: LWC
4377 + *
4378 + * Interrupt 2
4379 + * This is set and held when an hevc_active2 interrupt edge is detected
4380 + * The polarity of the edge is set by the ACTIVE2_EDGE field
4381 + * Write a 1 to this bit to clear down the latched interrupt
4382 + * The latched interrupt is only enabled out onto the interrupt line if
4383 + * ACTIVE2_EN is set
4384 + * Reset value is *0* decimal.
4385 + */
4386 +#define ARG_IC_ICTRL_ACTIVE2_INT_SET BIT(4)
4387 +
4388 +/* ACTIVE2_EDGE Sets the polarity of the interrupt edge detection logic
4389 + * This logic detects edges of the hevc_active2 line from the argon core
4390 + * 0 = negedge, 1 = posedge
4391 + * Reset value is *0* decimal.
4392 + */
4393 +#define ARG_IC_ICTRL_ACTIVE2_EDGE_SET BIT(5)
4394 +
4395 +/* ACTIVE2_EN Enables ACTIVE2_INT out onto the argon interrupt line.
4396 + * If this isn't set, the interrupt logic will work but no interrupt will be
4397 + * set to the interrupt controller
4398 + * Reset value is *1* decimal.
4399 + */
4400 +#define ARG_IC_ICTRL_ACTIVE2_EN_SET BIT(6)
4401 +
4402 +/* acc=RO ACTIVE2_STATUS FIELD ACCESS: RO
4403 + *
4404 + * The current status of the hevc_active2 signal
4405 + */
4406 +#define ARG_IC_ICTRL_ACTIVE2_STATUS_SET BIT(7)
4407 +
4408 +/* TEST_INT Forces the argon int high for test purposes.
4409 + * Reset value is *0* decimal.
4410 + */
4411 +#define ARG_IC_ICTRL_TEST_INT BIT(8)
4412 +#define ARG_IC_ICTRL_SPARE BIT(9)
4413 +
4414 +/* acc=RO VP9_INTERRUPT_STATUS FIELD ACCESS: RO
4415 + *
4416 + * The current status of the vp9_interrupt signal
4417 + */
4418 +#define ARG_IC_ICTRL_VP9_INTERRUPT_STATUS BIT(10)
4419 +
4420 +/* AIO_INT_ENABLE 1 = Or the AIO int in with the Argon int so the VPU can see
4421 + * it
4422 + * 0 = the AIO int is masked. (It should still be connected to the GIC though).
4423 + */
4424 +#define ARG_IC_ICTRL_AIO_INT_ENABLE BIT(20)
4425 +#define ARG_IC_ICTRL_H264_ACTIVE_INT BIT(21)
4426 +#define ARG_IC_ICTRL_H264_ACTIVE_EDGE BIT(22)
4427 +#define ARG_IC_ICTRL_H264_ACTIVE_EN BIT(23)
4428 +#define ARG_IC_ICTRL_H264_ACTIVE_STATUS BIT(24)
4429 +#define ARG_IC_ICTRL_H264_INTERRUPT_INT BIT(25)
4430 +#define ARG_IC_ICTRL_H264_INTERRUPT_EDGE BIT(26)
4431 +#define ARG_IC_ICTRL_H264_INTERRUPT_EN BIT(27)
4432 +
4433 +/* acc=RO H264_INTERRUPT_STATUS FIELD ACCESS: RO
4434 + *
4435 + * The current status of the h264_interrupt signal
4436 + */
4437 +#define ARG_IC_ICTRL_H264_INTERRUPT_STATUS BIT(28)
4438 +
4439 +/* acc=LWC VP9_INTERRUPT_INT FIELD ACCESS: LWC
4440 + *
4441 + * Interrupt 1
4442 + * This is set and held when an vp9_interrupt interrupt edge is detected
4443 + * The polarity of the edge is set by the VP9_INTERRUPT_EDGE field
4444 + * Write a 1 to this bit to clear down the latched interrupt
4445 + * The latched interrupt is only enabled out onto the interrupt line if
4446 + * VP9_INTERRUPT_EN is set
4447 + * Reset value is *0* decimal.
4448 + */
4449 +#define ARG_IC_ICTRL_VP9_INTERRUPT_INT BIT(29)
4450 +
4451 +/* VP9_INTERRUPT_EDGE Sets the polarity of the interrupt edge detection logic
4452 + * This logic detects edges of the vp9_interrupt line from the argon h264 core
4453 + * 0 = negedge, 1 = posedge
4454 + * Reset value is *0* decimal.
4455 + */
4456 +#define ARG_IC_ICTRL_VP9_INTERRUPT_EDGE BIT(30)
4457 +
4458 +/* VP9_INTERRUPT_EN Enables VP9_INTERRUPT_INT out onto the argon interrupt line.
4459 + * If this isn't set, the interrupt logic will work but no interrupt will be
4460 + * set to the interrupt controller
4461 + * Reset value is *1* decimal.
4462 + */
4463 +#define ARG_IC_ICTRL_VP9_INTERRUPT_EN BIT(31)
4464 +
4465 +/* Bits 19:12, 11 reserved - read ?, write 0 */
4466 +#define ARG_IC_ICTRL_SET_ZERO_MASK ((0xff << 12) | BIT(11))
4467 +
4468 +/* All IRQ bits */
4469 +#define ARG_IC_ICTRL_ALL_IRQ_MASK (\
4470 + ARG_IC_ICTRL_VP9_INTERRUPT_INT |\
4471 + ARG_IC_ICTRL_H264_INTERRUPT_INT |\
4472 + ARG_IC_ICTRL_ACTIVE1_INT_SET |\
4473 + ARG_IC_ICTRL_ACTIVE2_INT_SET)
4474 +
4475 +/* Regulate claim Q */
4476 +void rpivid_hw_irq_active1_enable_claim(struct rpivid_dev *dev,
4477 + int n);
4478 +/* Auto release once all CBs called */
4479 +void rpivid_hw_irq_active1_claim(struct rpivid_dev *dev,
4480 + struct rpivid_hw_irq_ent *ient,
4481 + rpivid_irq_callback ready_cb, void *ctx);
4482 +/* May only be called in claim cb */
4483 +void rpivid_hw_irq_active1_irq(struct rpivid_dev *dev,
4484 + struct rpivid_hw_irq_ent *ient,
4485 + rpivid_irq_callback irq_cb, void *ctx);
4486 +/* May only be called in irq cb */
4487 +void rpivid_hw_irq_active1_thread(struct rpivid_dev *dev,
4488 + struct rpivid_hw_irq_ent *ient,
4489 + rpivid_irq_callback thread_cb, void *ctx);
4490 +
4491 +/* Auto release once all CBs called */
4492 +void rpivid_hw_irq_active2_claim(struct rpivid_dev *dev,
4493 + struct rpivid_hw_irq_ent *ient,
4494 + rpivid_irq_callback ready_cb, void *ctx);
4495 +/* May only be called in claim cb */
4496 +void rpivid_hw_irq_active2_irq(struct rpivid_dev *dev,
4497 + struct rpivid_hw_irq_ent *ient,
4498 + rpivid_irq_callback irq_cb, void *ctx);
4499 +
4500 +int rpivid_hw_probe(struct rpivid_dev *dev);
4501 +void rpivid_hw_remove(struct rpivid_dev *dev);
4502 +
4503 +#endif
4504 --- /dev/null
4505 +++ b/drivers/staging/media/rpivid/rpivid_video.c
4506 @@ -0,0 +1,696 @@
4507 +// SPDX-License-Identifier: GPL-2.0
4508 +/*
4509 + * Raspberry Pi HEVC driver
4510 + *
4511 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
4512 + *
4513 + * Based on the Cedrus VPU driver, that is:
4514 + *
4515 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
4516 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
4517 + * Copyright (C) 2018 Bootlin
4518 + */
4519 +
4520 +#include <media/videobuf2-dma-contig.h>
4521 +#include <media/v4l2-device.h>
4522 +#include <media/v4l2-ioctl.h>
4523 +#include <media/v4l2-event.h>
4524 +#include <media/v4l2-mem2mem.h>
4525 +
4526 +#include "rpivid.h"
4527 +#include "rpivid_hw.h"
4528 +#include "rpivid_video.h"
4529 +#include "rpivid_dec.h"
4530 +
4531 +#define RPIVID_DECODE_SRC BIT(0)
4532 +#define RPIVID_DECODE_DST BIT(1)
4533 +
4534 +#define RPIVID_MIN_WIDTH 16U
4535 +#define RPIVID_MIN_HEIGHT 16U
4536 +#define RPIVID_DEFAULT_WIDTH 1920U
4537 +#define RPIVID_DEFAULT_HEIGHT 1088U
4538 +#define RPIVID_MAX_WIDTH 4096U
4539 +#define RPIVID_MAX_HEIGHT 4096U
4540 +
4541 +static inline struct rpivid_ctx *rpivid_file2ctx(struct file *file)
4542 +{
4543 + return container_of(file->private_data, struct rpivid_ctx, fh);
4544 +}
4545 +
4546 +/* constrain x to y,y*2 */
4547 +static inline unsigned int constrain2x(unsigned int x, unsigned int y)
4548 +{
4549 + return (x < y) ?
4550 + y :
4551 + (x > y * 2) ? y : x;
4552 +}
4553 +
4554 +size_t rpivid_round_up_size(const size_t x)
4555 +{
4556 + /* Admit no size < 256 */
4557 + const unsigned int n = x < 256 ? 8 : ilog2(x);
4558 +
4559 + return x >= (3 << n) ? 4 << n : (3 << n);
4560 +}
4561 +
4562 +size_t rpivid_bit_buf_size(unsigned int w, unsigned int h, unsigned int bits_minus8)
4563 +{
4564 + const size_t wxh = w * h;
4565 + size_t bits_alloc;
4566 +
4567 + /* Annex A gives a min compression of 2 @ lvl 3.1
4568 + * (wxh <= 983040) and min 4 thereafter but avoid
4569 + * the odity of 983041 having a lower limit than
4570 + * 983040.
4571 + * Multiply by 3/2 for 4:2:0
4572 + */
4573 + bits_alloc = wxh < 983040 ? wxh * 3 / 4 :
4574 + wxh < 983040 * 2 ? 983040 * 3 / 4 :
4575 + wxh * 3 / 8;
4576 + /* Allow for bit depth */
4577 + bits_alloc += (bits_alloc * bits_minus8) / 8;
4578 + return rpivid_round_up_size(bits_alloc);
4579 +}
4580 +
4581 +void rpivid_prepare_src_format(struct v4l2_pix_format_mplane *pix_fmt)
4582 +{
4583 + size_t size;
4584 + u32 w;
4585 + u32 h;
4586 +
4587 + w = pix_fmt->width;
4588 + h = pix_fmt->height;
4589 + if (!w || !h) {
4590 + w = RPIVID_DEFAULT_WIDTH;
4591 + h = RPIVID_DEFAULT_HEIGHT;
4592 + }
4593 + if (w > RPIVID_MAX_WIDTH)
4594 + w = RPIVID_MAX_WIDTH;
4595 + if (h > RPIVID_MAX_HEIGHT)
4596 + h = RPIVID_MAX_HEIGHT;
4597 +
4598 + if (!pix_fmt->plane_fmt[0].sizeimage ||
4599 + pix_fmt->plane_fmt[0].sizeimage > SZ_32M) {
4600 + /* Unspecified or way too big - pick max for size */
4601 + size = rpivid_bit_buf_size(w, h, 2);
4602 + }
4603 + /* Set a minimum */
4604 + size = max_t(u32, SZ_4K, pix_fmt->plane_fmt[0].sizeimage);
4605 +
4606 + pix_fmt->pixelformat = V4L2_PIX_FMT_HEVC_SLICE;
4607 + pix_fmt->width = w;
4608 + pix_fmt->height = h;
4609 + pix_fmt->num_planes = 1;
4610 + pix_fmt->field = V4L2_FIELD_NONE;
4611 + /* Zero bytes per line for encoded source. */
4612 + pix_fmt->plane_fmt[0].bytesperline = 0;
4613 + pix_fmt->plane_fmt[0].sizeimage = size;
4614 +}
4615 +
4616 +/* Take any pix_format and make it valid */
4617 +static void rpivid_prepare_dst_format(struct v4l2_pix_format_mplane *pix_fmt)
4618 +{
4619 + unsigned int width = pix_fmt->width;
4620 + unsigned int height = pix_fmt->height;
4621 + unsigned int sizeimage = pix_fmt->plane_fmt[0].sizeimage;
4622 + unsigned int bytesperline = pix_fmt->plane_fmt[0].bytesperline;
4623 +
4624 + if (!width)
4625 + width = RPIVID_DEFAULT_WIDTH;
4626 + if (width > RPIVID_MAX_WIDTH)
4627 + width = RPIVID_MAX_WIDTH;
4628 + if (!height)
4629 + height = RPIVID_DEFAULT_HEIGHT;
4630 + if (height > RPIVID_MAX_HEIGHT)
4631 + height = RPIVID_MAX_HEIGHT;
4632 +
4633 + /* For column formats set bytesperline to column height (stride2) */
4634 + switch (pix_fmt->pixelformat) {
4635 + default:
4636 + pix_fmt->pixelformat = V4L2_PIX_FMT_NV12_COL128;
4637 + fallthrough;
4638 + case V4L2_PIX_FMT_NV12_COL128:
4639 + /* Width rounds up to columns */
4640 + width = ALIGN(width, 128);
4641 +
4642 + /* 16 aligned height - not sure we even need that */
4643 + height = ALIGN(height, 16);
4644 + /* column height
4645 + * Accept suggested shape if at least min & < 2 * min
4646 + */
4647 + bytesperline = constrain2x(bytesperline, height * 3 / 2);
4648 +
4649 + /* image size
4650 + * Again allow plausible variation in case added padding is
4651 + * required
4652 + */
4653 + sizeimage = constrain2x(sizeimage, bytesperline * width);
4654 + break;
4655 +
4656 + case V4L2_PIX_FMT_NV12_10_COL128:
4657 + /* width in pixels (3 pels = 4 bytes) rounded to 128 byte
4658 + * columns
4659 + */
4660 + width = ALIGN(((width + 2) / 3), 32) * 3;
4661 +
4662 + /* 16-aligned height. */
4663 + height = ALIGN(height, 16);
4664 +
4665 + /* column height
4666 + * Accept suggested shape if at least min & < 2 * min
4667 + */
4668 + bytesperline = constrain2x(bytesperline, height * 3 / 2);
4669 +
4670 + /* image size
4671 + * Again allow plausible variation in case added padding is
4672 + * required
4673 + */
4674 + sizeimage = constrain2x(sizeimage,
4675 + bytesperline * width * 4 / 3);
4676 + break;
4677 + }
4678 +
4679 + pix_fmt->width = width;
4680 + pix_fmt->height = height;
4681 +
4682 + pix_fmt->field = V4L2_FIELD_NONE;
4683 + pix_fmt->plane_fmt[0].bytesperline = bytesperline;
4684 + pix_fmt->plane_fmt[0].sizeimage = sizeimage;
4685 + pix_fmt->num_planes = 1;
4686 +}
4687 +
4688 +static int rpivid_querycap(struct file *file, void *priv,
4689 + struct v4l2_capability *cap)
4690 +{
4691 + strscpy(cap->driver, RPIVID_NAME, sizeof(cap->driver));
4692 + strscpy(cap->card, RPIVID_NAME, sizeof(cap->card));
4693 + snprintf(cap->bus_info, sizeof(cap->bus_info),
4694 + "platform:%s", RPIVID_NAME);
4695 +
4696 + return 0;
4697 +}
4698 +
4699 +static int rpivid_enum_fmt_vid_out(struct file *file, void *priv,
4700 + struct v4l2_fmtdesc *f)
4701 +{
4702 + // Input formats
4703 +
4704 + // H.265 Slice only currently
4705 + if (f->index == 0) {
4706 + f->pixelformat = V4L2_PIX_FMT_HEVC_SLICE;
4707 + return 0;
4708 + }
4709 +
4710 + return -EINVAL;
4711 +}
4712 +
4713 +static int rpivid_hevc_validate_sps(const struct v4l2_ctrl_hevc_sps * const sps)
4714 +{
4715 + const unsigned int ctb_log2_size_y =
4716 + sps->log2_min_luma_coding_block_size_minus3 + 3 +
4717 + sps->log2_diff_max_min_luma_coding_block_size;
4718 + const unsigned int min_tb_log2_size_y =
4719 + sps->log2_min_luma_transform_block_size_minus2 + 2;
4720 + const unsigned int max_tb_log2_size_y = min_tb_log2_size_y +
4721 + sps->log2_diff_max_min_luma_transform_block_size;
4722 +
4723 + /* Local limitations */
4724 + if (sps->pic_width_in_luma_samples < 32 ||
4725 + sps->pic_width_in_luma_samples > 4096)
4726 + return 0;
4727 + if (sps->pic_height_in_luma_samples < 32 ||
4728 + sps->pic_height_in_luma_samples > 4096)
4729 + return 0;
4730 + if (!(sps->bit_depth_luma_minus8 == 0 ||
4731 + sps->bit_depth_luma_minus8 == 2))
4732 + return 0;
4733 + if (sps->bit_depth_luma_minus8 != sps->bit_depth_chroma_minus8)
4734 + return 0;
4735 + if (sps->chroma_format_idc != 1)
4736 + return 0;
4737 +
4738 + /* Limits from H.265 7.4.3.2.1 */
4739 + if (sps->log2_max_pic_order_cnt_lsb_minus4 > 12)
4740 + return 0;
4741 + if (sps->sps_max_dec_pic_buffering_minus1 > 15)
4742 + return 0;
4743 + if (sps->sps_max_num_reorder_pics >
4744 + sps->sps_max_dec_pic_buffering_minus1)
4745 + return 0;
4746 + if (ctb_log2_size_y > 6)
4747 + return 0;
4748 + if (max_tb_log2_size_y > 5)
4749 + return 0;
4750 + if (max_tb_log2_size_y > ctb_log2_size_y)
4751 + return 0;
4752 + if (sps->max_transform_hierarchy_depth_inter >
4753 + (ctb_log2_size_y - min_tb_log2_size_y))
4754 + return 0;
4755 + if (sps->max_transform_hierarchy_depth_intra >
4756 + (ctb_log2_size_y - min_tb_log2_size_y))
4757 + return 0;
4758 + /* Check pcm stuff */
4759 + if (sps->num_short_term_ref_pic_sets > 64)
4760 + return 0;
4761 + if (sps->num_long_term_ref_pics_sps > 32)
4762 + return 0;
4763 + return 1;
4764 +}
4765 +
4766 +static inline int is_sps_set(const struct v4l2_ctrl_hevc_sps * const sps)
4767 +{
4768 + return sps && sps->pic_width_in_luma_samples != 0;
4769 +}
4770 +
4771 +static u32 pixelformat_from_sps(const struct v4l2_ctrl_hevc_sps * const sps,
4772 + const int index)
4773 +{
4774 + u32 pf = 0;
4775 +
4776 + if (!is_sps_set(sps) || !rpivid_hevc_validate_sps(sps)) {
4777 + /* Treat this as an error? For now return both */
4778 + if (index == 0)
4779 + pf = V4L2_PIX_FMT_NV12_COL128;
4780 + else if (index == 1)
4781 + pf = V4L2_PIX_FMT_NV12_10_COL128;
4782 + } else if (index == 0) {
4783 + if (sps->bit_depth_luma_minus8 == 0)
4784 + pf = V4L2_PIX_FMT_NV12_COL128;
4785 + else if (sps->bit_depth_luma_minus8 == 2)
4786 + pf = V4L2_PIX_FMT_NV12_10_COL128;
4787 + }
4788 +
4789 + return pf;
4790 +}
4791 +
4792 +static struct v4l2_pix_format_mplane
4793 +rpivid_hevc_default_dst_fmt(struct rpivid_ctx * const ctx)
4794 +{
4795 + const struct v4l2_ctrl_hevc_sps * const sps =
4796 + rpivid_find_control_data(ctx, V4L2_CID_STATELESS_HEVC_SPS);
4797 + struct v4l2_pix_format_mplane pix_fmt;
4798 +
4799 + memset(&pix_fmt, 0, sizeof(pix_fmt));
4800 + if (is_sps_set(sps)) {
4801 + pix_fmt.width = sps->pic_width_in_luma_samples;
4802 + pix_fmt.height = sps->pic_height_in_luma_samples;
4803 + pix_fmt.pixelformat = pixelformat_from_sps(sps, 0);
4804 + }
4805 +
4806 + rpivid_prepare_dst_format(&pix_fmt);
4807 + return pix_fmt;
4808 +}
4809 +
4810 +static u32 rpivid_hevc_get_dst_pixelformat(struct rpivid_ctx * const ctx,
4811 + const int index)
4812 +{
4813 + const struct v4l2_ctrl_hevc_sps * const sps =
4814 + rpivid_find_control_data(ctx, V4L2_CID_STATELESS_HEVC_SPS);
4815 +
4816 + return pixelformat_from_sps(sps, index);
4817 +}
4818 +
4819 +static int rpivid_enum_fmt_vid_cap(struct file *file, void *priv,
4820 + struct v4l2_fmtdesc *f)
4821 +{
4822 + struct rpivid_ctx * const ctx = rpivid_file2ctx(file);
4823 +
4824 + const u32 pf = rpivid_hevc_get_dst_pixelformat(ctx, f->index);
4825 +
4826 + if (pf == 0)
4827 + return -EINVAL;
4828 +
4829 + f->pixelformat = pf;
4830 + return 0;
4831 +}
4832 +
4833 +/*
4834 + * get dst format - sets it to default if otherwise unset
4835 + * returns a pointer to the struct as a convienience
4836 + */
4837 +static struct v4l2_pix_format_mplane *get_dst_fmt(struct rpivid_ctx *const ctx)
4838 +{
4839 + if (!ctx->dst_fmt_set)
4840 + ctx->dst_fmt = rpivid_hevc_default_dst_fmt(ctx);
4841 + return &ctx->dst_fmt;
4842 +}
4843 +
4844 +static int rpivid_g_fmt_vid_cap(struct file *file, void *priv,
4845 + struct v4l2_format *f)
4846 +{
4847 + struct rpivid_ctx *ctx = rpivid_file2ctx(file);
4848 +
4849 + f->fmt.pix_mp = *get_dst_fmt(ctx);
4850 + return 0;
4851 +}
4852 +
4853 +static int rpivid_g_fmt_vid_out(struct file *file, void *priv,
4854 + struct v4l2_format *f)
4855 +{
4856 + struct rpivid_ctx *ctx = rpivid_file2ctx(file);
4857 +
4858 + f->fmt.pix_mp = ctx->src_fmt;
4859 + return 0;
4860 +}
4861 +
4862 +static inline void copy_color(struct v4l2_pix_format_mplane *d,
4863 + const struct v4l2_pix_format_mplane *s)
4864 +{
4865 + d->colorspace = s->colorspace;
4866 + d->xfer_func = s->xfer_func;
4867 + d->ycbcr_enc = s->ycbcr_enc;
4868 + d->quantization = s->quantization;
4869 +}
4870 +
4871 +static int rpivid_try_fmt_vid_cap(struct file *file, void *priv,
4872 + struct v4l2_format *f)
4873 +{
4874 + struct rpivid_ctx *ctx = rpivid_file2ctx(file);
4875 + const struct v4l2_ctrl_hevc_sps * const sps =
4876 + rpivid_find_control_data(ctx, V4L2_CID_STATELESS_HEVC_SPS);
4877 + u32 pixelformat;
4878 + int i;
4879 +
4880 + for (i = 0; (pixelformat = pixelformat_from_sps(sps, i)) != 0; i++) {
4881 + if (f->fmt.pix_mp.pixelformat == pixelformat)
4882 + break;
4883 + }
4884 +
4885 + // We don't have any way of finding out colourspace so believe
4886 + // anything we are told - take anything set in src as a default
4887 + if (f->fmt.pix_mp.colorspace == V4L2_COLORSPACE_DEFAULT)
4888 + copy_color(&f->fmt.pix_mp, &ctx->src_fmt);
4889 +
4890 + f->fmt.pix_mp.pixelformat = pixelformat;
4891 + rpivid_prepare_dst_format(&f->fmt.pix_mp);
4892 + return 0;
4893 +}
4894 +
4895 +static int rpivid_try_fmt_vid_out(struct file *file, void *priv,
4896 + struct v4l2_format *f)
4897 +{
4898 + rpivid_prepare_src_format(&f->fmt.pix_mp);
4899 + return 0;
4900 +}
4901 +
4902 +static int rpivid_s_fmt_vid_cap(struct file *file, void *priv,
4903 + struct v4l2_format *f)
4904 +{
4905 + struct rpivid_ctx *ctx = rpivid_file2ctx(file);
4906 + struct vb2_queue *vq;
4907 + int ret;
4908 +
4909 + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
4910 + if (vb2_is_busy(vq))
4911 + return -EBUSY;
4912 +
4913 + ret = rpivid_try_fmt_vid_cap(file, priv, f);
4914 + if (ret)
4915 + return ret;
4916 +
4917 + ctx->dst_fmt = f->fmt.pix_mp;
4918 + ctx->dst_fmt_set = 1;
4919 +
4920 + return 0;
4921 +}
4922 +
4923 +static int rpivid_s_fmt_vid_out(struct file *file, void *priv,
4924 + struct v4l2_format *f)
4925 +{
4926 + struct rpivid_ctx *ctx = rpivid_file2ctx(file);
4927 + struct vb2_queue *vq;
4928 + int ret;
4929 +
4930 + vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
4931 + if (vb2_is_busy(vq))
4932 + return -EBUSY;
4933 +
4934 + ret = rpivid_try_fmt_vid_out(file, priv, f);
4935 + if (ret)
4936 + return ret;
4937 +
4938 + ctx->src_fmt = f->fmt.pix_mp;
4939 + ctx->dst_fmt_set = 0; // Setting src invalidates dst
4940 +
4941 + vq->subsystem_flags |=
4942 + VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF;
4943 +
4944 + /* Propagate colorspace information to capture. */
4945 + copy_color(&ctx->dst_fmt, &f->fmt.pix_mp);
4946 + return 0;
4947 +}
4948 +
4949 +const struct v4l2_ioctl_ops rpivid_ioctl_ops = {
4950 + .vidioc_querycap = rpivid_querycap,
4951 +
4952 + .vidioc_enum_fmt_vid_cap = rpivid_enum_fmt_vid_cap,
4953 + .vidioc_g_fmt_vid_cap_mplane = rpivid_g_fmt_vid_cap,
4954 + .vidioc_try_fmt_vid_cap_mplane = rpivid_try_fmt_vid_cap,
4955 + .vidioc_s_fmt_vid_cap_mplane = rpivid_s_fmt_vid_cap,
4956 +
4957 + .vidioc_enum_fmt_vid_out = rpivid_enum_fmt_vid_out,
4958 + .vidioc_g_fmt_vid_out_mplane = rpivid_g_fmt_vid_out,
4959 + .vidioc_try_fmt_vid_out_mplane = rpivid_try_fmt_vid_out,
4960 + .vidioc_s_fmt_vid_out_mplane = rpivid_s_fmt_vid_out,
4961 +
4962 + .vidioc_reqbufs = v4l2_m2m_ioctl_reqbufs,
4963 + .vidioc_querybuf = v4l2_m2m_ioctl_querybuf,
4964 + .vidioc_qbuf = v4l2_m2m_ioctl_qbuf,
4965 + .vidioc_dqbuf = v4l2_m2m_ioctl_dqbuf,
4966 + .vidioc_prepare_buf = v4l2_m2m_ioctl_prepare_buf,
4967 + .vidioc_create_bufs = v4l2_m2m_ioctl_create_bufs,
4968 + .vidioc_expbuf = v4l2_m2m_ioctl_expbuf,
4969 +
4970 + .vidioc_streamon = v4l2_m2m_ioctl_streamon,
4971 + .vidioc_streamoff = v4l2_m2m_ioctl_streamoff,
4972 +
4973 + .vidioc_try_decoder_cmd = v4l2_m2m_ioctl_stateless_try_decoder_cmd,
4974 + .vidioc_decoder_cmd = v4l2_m2m_ioctl_stateless_decoder_cmd,
4975 +
4976 + .vidioc_subscribe_event = v4l2_ctrl_subscribe_event,
4977 + .vidioc_unsubscribe_event = v4l2_event_unsubscribe,
4978 +};
4979 +
4980 +static int rpivid_queue_setup(struct vb2_queue *vq, unsigned int *nbufs,
4981 + unsigned int *nplanes, unsigned int sizes[],
4982 + struct device *alloc_devs[])
4983 +{
4984 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vq);
4985 + struct v4l2_pix_format_mplane *pix_fmt;
4986 +
4987 + if (V4L2_TYPE_IS_OUTPUT(vq->type))
4988 + pix_fmt = &ctx->src_fmt;
4989 + else
4990 + pix_fmt = get_dst_fmt(ctx);
4991 +
4992 + if (*nplanes) {
4993 + if (sizes[0] < pix_fmt->plane_fmt[0].sizeimage)
4994 + return -EINVAL;
4995 + } else {
4996 + sizes[0] = pix_fmt->plane_fmt[0].sizeimage;
4997 + *nplanes = 1;
4998 + }
4999 +
5000 + return 0;
5001 +}
5002 +
5003 +static void rpivid_queue_cleanup(struct vb2_queue *vq, u32 state)
5004 +{
5005 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vq);
5006 + struct vb2_v4l2_buffer *vbuf;
5007 +
5008 + for (;;) {
5009 + if (V4L2_TYPE_IS_OUTPUT(vq->type))
5010 + vbuf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
5011 + else
5012 + vbuf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
5013 +
5014 + if (!vbuf)
5015 + return;
5016 +
5017 + v4l2_ctrl_request_complete(vbuf->vb2_buf.req_obj.req,
5018 + &ctx->hdl);
5019 + v4l2_m2m_buf_done(vbuf, state);
5020 + }
5021 +}
5022 +
5023 +static int rpivid_buf_out_validate(struct vb2_buffer *vb)
5024 +{
5025 + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
5026 +
5027 + vbuf->field = V4L2_FIELD_NONE;
5028 + return 0;
5029 +}
5030 +
5031 +static int rpivid_buf_prepare(struct vb2_buffer *vb)
5032 +{
5033 + struct vb2_queue *vq = vb->vb2_queue;
5034 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vq);
5035 + struct v4l2_pix_format_mplane *pix_fmt;
5036 +
5037 + if (V4L2_TYPE_IS_OUTPUT(vq->type))
5038 + pix_fmt = &ctx->src_fmt;
5039 + else
5040 + pix_fmt = &ctx->dst_fmt;
5041 +
5042 + if (vb2_plane_size(vb, 0) < pix_fmt->plane_fmt[0].sizeimage)
5043 + return -EINVAL;
5044 +
5045 + vb2_set_plane_payload(vb, 0, pix_fmt->plane_fmt[0].sizeimage);
5046 +
5047 + return 0;
5048 +}
5049 +
5050 +/* Only stops the clock if streaom off on both output & capture */
5051 +static void stop_clock(struct rpivid_dev *dev, struct rpivid_ctx *ctx)
5052 +{
5053 + if (ctx->src_stream_on ||
5054 + ctx->dst_stream_on)
5055 + return;
5056 +
5057 + clk_set_min_rate(dev->clock, 0);
5058 + clk_disable_unprepare(dev->clock);
5059 +}
5060 +
5061 +/* Always starts the clock if it isn't already on this ctx */
5062 +static int start_clock(struct rpivid_dev *dev, struct rpivid_ctx *ctx)
5063 +{
5064 + int rv;
5065 +
5066 + rv = clk_set_min_rate(dev->clock, dev->max_clock_rate);
5067 + if (rv) {
5068 + dev_err(dev->dev, "Failed to set clock rate\n");
5069 + return rv;
5070 + }
5071 +
5072 + rv = clk_prepare_enable(dev->clock);
5073 + if (rv) {
5074 + dev_err(dev->dev, "Failed to enable clock\n");
5075 + return rv;
5076 + }
5077 +
5078 + return 0;
5079 +}
5080 +
5081 +static int rpivid_start_streaming(struct vb2_queue *vq, unsigned int count)
5082 +{
5083 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vq);
5084 + struct rpivid_dev *dev = ctx->dev;
5085 + int ret = 0;
5086 +
5087 + if (!V4L2_TYPE_IS_OUTPUT(vq->type)) {
5088 + ctx->dst_stream_on = 1;
5089 + goto ok;
5090 + }
5091 +
5092 + if (ctx->src_fmt.pixelformat != V4L2_PIX_FMT_HEVC_SLICE) {
5093 + ret = -EINVAL;
5094 + goto fail_cleanup;
5095 + }
5096 +
5097 + if (ctx->src_stream_on)
5098 + goto ok;
5099 +
5100 + ret = start_clock(dev, ctx);
5101 + if (ret)
5102 + goto fail_cleanup;
5103 +
5104 + if (dev->dec_ops->start)
5105 + ret = dev->dec_ops->start(ctx);
5106 + if (ret)
5107 + goto fail_stop_clock;
5108 +
5109 + ctx->src_stream_on = 1;
5110 +ok:
5111 + return 0;
5112 +
5113 +fail_stop_clock:
5114 + stop_clock(dev, ctx);
5115 +fail_cleanup:
5116 + v4l2_err(&dev->v4l2_dev, "%s: qtype=%d: FAIL\n", __func__, vq->type);
5117 + rpivid_queue_cleanup(vq, VB2_BUF_STATE_QUEUED);
5118 + return ret;
5119 +}
5120 +
5121 +static void rpivid_stop_streaming(struct vb2_queue *vq)
5122 +{
5123 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vq);
5124 + struct rpivid_dev *dev = ctx->dev;
5125 +
5126 + if (V4L2_TYPE_IS_OUTPUT(vq->type)) {
5127 + ctx->src_stream_on = 0;
5128 + if (dev->dec_ops->stop)
5129 + dev->dec_ops->stop(ctx);
5130 + } else {
5131 + ctx->dst_stream_on = 0;
5132 + }
5133 +
5134 + rpivid_queue_cleanup(vq, VB2_BUF_STATE_ERROR);
5135 +
5136 + vb2_wait_for_all_buffers(vq);
5137 +
5138 + stop_clock(dev, ctx);
5139 +}
5140 +
5141 +static void rpivid_buf_queue(struct vb2_buffer *vb)
5142 +{
5143 + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
5144 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
5145 +
5146 + v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, vbuf);
5147 +}
5148 +
5149 +static void rpivid_buf_request_complete(struct vb2_buffer *vb)
5150 +{
5151 + struct rpivid_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
5152 +
5153 + v4l2_ctrl_request_complete(vb->req_obj.req, &ctx->hdl);
5154 +}
5155 +
5156 +static struct vb2_ops rpivid_qops = {
5157 + .queue_setup = rpivid_queue_setup,
5158 + .buf_prepare = rpivid_buf_prepare,
5159 + .buf_queue = rpivid_buf_queue,
5160 + .buf_out_validate = rpivid_buf_out_validate,
5161 + .buf_request_complete = rpivid_buf_request_complete,
5162 + .start_streaming = rpivid_start_streaming,
5163 + .stop_streaming = rpivid_stop_streaming,
5164 + .wait_prepare = vb2_ops_wait_prepare,
5165 + .wait_finish = vb2_ops_wait_finish,
5166 +};
5167 +
5168 +int rpivid_queue_init(void *priv, struct vb2_queue *src_vq,
5169 + struct vb2_queue *dst_vq)
5170 +{
5171 + struct rpivid_ctx *ctx = priv;
5172 + int ret;
5173 +
5174 + src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
5175 + src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
5176 + src_vq->drv_priv = ctx;
5177 + src_vq->buf_struct_size = sizeof(struct rpivid_buffer);
5178 + src_vq->ops = &rpivid_qops;
5179 + src_vq->mem_ops = &vb2_dma_contig_memops;
5180 + src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
5181 + src_vq->lock = &ctx->ctx_mutex;
5182 + src_vq->dev = ctx->dev->dev;
5183 + src_vq->supports_requests = true;
5184 + src_vq->requires_requests = true;
5185 +
5186 + ret = vb2_queue_init(src_vq);
5187 + if (ret)
5188 + return ret;
5189 +
5190 + dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
5191 + dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
5192 + dst_vq->drv_priv = ctx;
5193 + dst_vq->buf_struct_size = sizeof(struct rpivid_buffer);
5194 + dst_vq->min_buffers_needed = 1;
5195 + dst_vq->ops = &rpivid_qops;
5196 + dst_vq->mem_ops = &vb2_dma_contig_memops;
5197 + dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
5198 + dst_vq->lock = &ctx->ctx_mutex;
5199 + dst_vq->dev = ctx->dev->dev;
5200 +
5201 + return vb2_queue_init(dst_vq);
5202 +}
5203 --- /dev/null
5204 +++ b/drivers/staging/media/rpivid/rpivid_video.h
5205 @@ -0,0 +1,33 @@
5206 +/* SPDX-License-Identifier: GPL-2.0 */
5207 +/*
5208 + * Raspberry Pi HEVC driver
5209 + *
5210 + * Copyright (C) 2020 Raspberry Pi (Trading) Ltd
5211 + *
5212 + * Based on the Cedrus VPU driver, that is:
5213 + *
5214 + * Copyright (C) 2016 Florent Revest <florent.revest@free-electrons.com>
5215 + * Copyright (C) 2018 Paul Kocialkowski <paul.kocialkowski@bootlin.com>
5216 + * Copyright (C) 2018 Bootlin
5217 + */
5218 +
5219 +#ifndef _RPIVID_VIDEO_H_
5220 +#define _RPIVID_VIDEO_H_
5221 +
5222 +struct rpivid_format {
5223 + u32 pixelformat;
5224 + u32 directions;
5225 + unsigned int capabilities;
5226 +};
5227 +
5228 +extern const struct v4l2_ioctl_ops rpivid_ioctl_ops;
5229 +
5230 +int rpivid_queue_init(void *priv, struct vb2_queue *src_vq,
5231 + struct vb2_queue *dst_vq);
5232 +
5233 +size_t rpivid_bit_buf_size(unsigned int w, unsigned int h, unsigned int bits_minus8);
5234 +size_t rpivid_round_up_size(const size_t x);
5235 +
5236 +void rpivid_prepare_src_format(struct v4l2_pix_format_mplane *pix_fmt);
5237 +
5238 +#endif