hello everyone, (basically) first post--sorry, it's long one. i've developed cross-platform c app low latency desktop streaming via h.264, , i'm trying bring pi. i'm using openmax il , ilclient (<--thanks this!) libraries provided raspbian. of i've done based off of /opt/vc/src/hello_pi/hello_video/video.c file found on raspbian.
server program sends stream of encoded h.264 frames @ precisely 60 fps on tcp client. in windows , mac os versions of client, use platform specific hardware decoder libraries, draw raw frames sdl2 library (which uses opengl under hood).
based on video.c example mentioned above, have working version on pi, not without issues. fyi, have removed "clock" , "video_scheduler" components, seemed cause choppier video compared tunneling "video_decode" component output port directly "video_render" input port. testing done on pi 3 current unmodified version (2016-05-10) of raspbian jessie.
questions / issues follows:
1. main issue: if set h.264 quality low, qp of 36 1080p@60, pi able handle incoming data smoothly. average bitrate in scenario 2-3 mbps. if quality around 30 qp, things start breaking. resulting bitrate increases 5-6 mbps, occasional spikes of around 10 mbps. strange thing is, not ilclient or omx libraries returning errors here--i getting packet loss on tcp. video sporadically choppy, occasional, noticeable delays between frames. eventually, when trying read size header send every frame, value incorrect (i.e. negative value), indicating packet loss, , program exits. may more of networking issue openmax issue, know pi can support higher bandwidth 10 mbps out of box (i using built in wired connection). have tinkered every sysctl net.core , net.ipv4.tcp* setting can imagine no avail. theory there bottleneck on shared bus affecting tcp reads. cpu usage during issue reads @ low 15-25% (of single core). overclocking pi gpu_freq, force_turbo, , over_voltage did not seem either. tcp reads occur in separate thread , not blocked other processing. voltage stable , not getting voltage or temperature warnings. ideas here? update: tested program without openmax decoding or rendering, issue remains. must networking bottleneck of kind...
2. "video_render" omx component best choice presenting frames, or possible use sdl2 w/ opengl achieve similar performance? if sdl2 option, default raw pixel format frames come out of "video_decode" component, , can modify pixel format?
3. there way turn vsync off "video_render" component?
4. there way increase buffer size obtained through ilclient_get_input_buffer? buffers seem default 80kb.
5. i'm interested in setting omx_dataunitcodedpicture potentially reduce latency, read frames network 1 frame @ time, i'm not sure how set via omx_indexparambrcmdataunit. examples out there?
6. other tips reducing potential buffering / latency in decoding & rendering process?
in advance!
chris
server program sends stream of encoded h.264 frames @ precisely 60 fps on tcp client. in windows , mac os versions of client, use platform specific hardware decoder libraries, draw raw frames sdl2 library (which uses opengl under hood).
based on video.c example mentioned above, have working version on pi, not without issues. fyi, have removed "clock" , "video_scheduler" components, seemed cause choppier video compared tunneling "video_decode" component output port directly "video_render" input port. testing done on pi 3 current unmodified version (2016-05-10) of raspbian jessie.
questions / issues follows:
1. main issue: if set h.264 quality low, qp of 36 1080p@60, pi able handle incoming data smoothly. average bitrate in scenario 2-3 mbps. if quality around 30 qp, things start breaking. resulting bitrate increases 5-6 mbps, occasional spikes of around 10 mbps. strange thing is, not ilclient or omx libraries returning errors here--i getting packet loss on tcp. video sporadically choppy, occasional, noticeable delays between frames. eventually, when trying read size header send every frame, value incorrect (i.e. negative value), indicating packet loss, , program exits. may more of networking issue openmax issue, know pi can support higher bandwidth 10 mbps out of box (i using built in wired connection). have tinkered every sysctl net.core , net.ipv4.tcp* setting can imagine no avail. theory there bottleneck on shared bus affecting tcp reads. cpu usage during issue reads @ low 15-25% (of single core). overclocking pi gpu_freq, force_turbo, , over_voltage did not seem either. tcp reads occur in separate thread , not blocked other processing. voltage stable , not getting voltage or temperature warnings. ideas here? update: tested program without openmax decoding or rendering, issue remains. must networking bottleneck of kind...
2. "video_render" omx component best choice presenting frames, or possible use sdl2 w/ opengl achieve similar performance? if sdl2 option, default raw pixel format frames come out of "video_decode" component, , can modify pixel format?
3. there way turn vsync off "video_render" component?
4. there way increase buffer size obtained through ilclient_get_input_buffer? buffers seem default 80kb.
5. i'm interested in setting omx_dataunitcodedpicture potentially reduce latency, read frames network 1 frame @ time, i'm not sure how set via omx_indexparambrcmdataunit. examples out there?
6. other tips reducing potential buffering / latency in decoding & rendering process?
in advance!
chris
1 - tcp should never lose data, may insert long delays if has retry frames. udp not guaranteed service, application layer needs handle reordered buffers , potential retries. sounds networking issue.
2 - if you're dealing multimedia buffers video_decode or similar, video_render best bet. gles has convert rgb before processing buffers, , suspect sdl have compose on arm.
3 - thinking achieve other tearing? video_render has input fifo of depth 1. if newer buffer presented before has submitted old 1 dispmanx, older 1 dropped.
4 - should able increase doing omx_setparameter((*comp)->comp, omx_indexparamportdefinition, portdef) nbuffersize having larger value before enable port. 80kb default. likewise can increase number of buffers if helps unblock app.
5 - can't remember detail.
6 - submit encoded frames 1 per buffer, therefore ensure set nflags appropriately @ least omx_bufferflag_endofframe. saves codec searching stream start codes.
adding clock , video_scheduler should result in smoother playback, you'll need insert delay in avoid frames appearing late. there distinct possibility source clock drift though, , no sensible way sync them, may more hassle it's worth.
1080p60 above level 4 or 4.1 (i forget which) codec specified support. it's there on "best efforts" basis , overclock in attempt achieve it.
2 - if you're dealing multimedia buffers video_decode or similar, video_render best bet. gles has convert rgb before processing buffers, , suspect sdl have compose on arm.
3 - thinking achieve other tearing? video_render has input fifo of depth 1. if newer buffer presented before has submitted old 1 dispmanx, older 1 dropped.
4 - should able increase doing omx_setparameter((*comp)->comp, omx_indexparamportdefinition, portdef) nbuffersize having larger value before enable port. 80kb default. likewise can increase number of buffers if helps unblock app.
5 - can't remember detail.
6 - submit encoded frames 1 per buffer, therefore ensure set nflags appropriately @ least omx_bufferflag_endofframe. saves codec searching stream start codes.
adding clock , video_scheduler should result in smoother playback, you'll need insert delay in avoid frames appearing late. there distinct possibility source clock drift though, , no sensible way sync them, may more hassle it's worth.
1080p60 above level 4 or 4.1 (i forget which) codec specified support. it's there on "best efforts" basis , overclock in attempt achieve it.
raspberrypi
Comments
Post a Comment